How to quickly get a list of keys from dict - python

I construct a dictionary from an excel sheet and end up with something like:
d = {('a','b','c'): val1, ('a','d'): val2}
The tuples I use as keys contain a handful of values, the goal is to get a list of these values which occur more than a certain number of times.
I've tried two solutions, both of which take entirely too long.
Attempt 1, simple list comprehension filter:
keyList = []
for k in d.keys():
keyList.extend(list(k))
# The script makes it to here before hanging
commonkeylist = [key for key in keyList if keyList.count(key) > 5]
This takes forever since list.count() traverses the least on each iteration of the comprehension.
Attempt 2, create a count dictionary
keyList = []
keydict = {}
for k in d.keys():
keyList.extend(list(k))
# The script makes it to here before hanging
for k in keyList:
if k in keydict.keys():
keydict[k] += 1
else:
keydict[k] = 1
commonkeylist = [k for k in keyList if keydict[k] > 50]
I thought this would be faster since we only traverse all of keyList a handful of times, but it still hangs the script.
What other steps can I take to improve the efficiency of this operation?

Use collections.Counter() and a generator expression:
from collections import Counter
counts = Counter(item for key in d for item in key)
commonkkeylist = [item for item, count in counts.most_common() if count > 50]
where iterating over the dictionary directly yields the keys without creating an intermediary list object.
Demo with a lower count filter:
>>> from collections import Counter
>>> d = {('a','b','c'): 'val1', ('a','d'): 'val2'}
>>> counts = Counter(item for key in d for item in key)
>>> counts
Counter({'a': 2, 'c': 1, 'b': 1, 'd': 1})
>>> [item for item, count in counts.most_common() if count > 1]
['a']

I thought this would be faster since we only traverse all of keyList a
handful of times, but it still hangs the script.
That's because you're still doing an O(n) search. Replace this:
for k in keyList:
if k in keydict.keys():
with this:
for k in keyList:
if k in keydict:
and see if that helps your 2nd attempt perform better.

Related

Couting value in list inside of a list

I want to count identical values in my lists in list.
already I coded it:
id_list = [['cat','animal'],['snake','animal'], ['rose','flower'], ['tomato','vegetable']]
duplicates = []
for x in range(len(id_list)):
if id_list.count(id_list[x][1]) >= 2:
duplicates.append(id_list[x][1])
print(duplicates)
I think it don't work becouse the count is counting id[x][1] and don't seen any other values in rest of lists.
If there any way to count my lists instead of value of that list but leaning on this value?
Thank for all help and advice
Have a nice day!
You can get the count of all the elements from your list in a dictionary like this:
>>> id_list = [['cat','animal'],['snake','animal'], ['rose','flower'], ['tomato','vegetable']]
>>> {k: sum(id_list, []).count(k) for k in sum(id_list, [])}
{'cat': 1, 'animal': 2, 'snake': 1, 'rose': 1, 'flower': 1, 'tomato': 1, 'vegetable': 1}
You can extract the elements whose value (count) is greater than 1 to identify as duplicates.
Explanation: sum(id_list, []) basically flattens a list of lists, this would work for any number of elements inside your inner lists. sum(id_list, []).count(k) stores the count of every k inside this flattened list and stores it in a dictionary with k as key and the count as value. You can iterate this dictionary now and select only those elements whose count is greater than, let’s say 1:
my_dict = {k: sum(id_list, []).count(k) for k in sum(id_list, [])}
for key, count in my_dict.items():
if count > 1:
print(key)
or create the dictionary directly by:
flat_list = sum(id_list, [])
>>> {k: flat_list.count(k) for k in flat_list if flat_list.count(k) > 1}
{'animal': 2}
How about this:
id_list = [['cat','animal'],['snake','animal'], ['rose','flower'], ['tomato','vegetable']]
els = [el[1] for el in id_list]
[k for k,v in {i:els.count(i) for i in els }.items() if v > 1]
['animal']
Kr

Python: List of pairs. Making every pair single and sum the values of the same keys

I have a list of pairs.The list contains items of [x,y].I would like to make list or dictionary making the left item the key and right the value.The list maybe contains multiple times the same key. I want to sum the values and keep one time the key.
E.x
pairs[0]=['3106124650', 2.86]
pairs[1]=['3106124650', 8.86]
pairs[2]=['5216154610', 23.77]
I want to keep '3106124650' one time and sum the values.So my new list or dictionary will contain one time this key with value 11.72.
'3106124650',11.72
Here's a way. For large datasets, numpy will probably be faster though.
import collections
result = collections.defaultdict(lambda : 0)
for k,v in pairs:
result[k]+=v
sumdict = dict()
for i, v in pairs:
sumdict[i] = v + sumdict.get(i, 0)
li=[['a',1],['a',2],['b',3],['c',4]]
d={}
for w in li:
d[w[0]]=w[1]+d.get(w[0],0)
Output:{'a': 3, 'b': 3, 'c': 4}
you can try this:
d={}
for entry in pairs:
if entry[0] in d:
d[entry[0]]+=entry[1]
else:
d[entry[0]]=entry[1]

Count how many times are items from list 1 in list 2

I have 2 lists:
1. ['a', 'b', 'c']
2. ['a', 'd', 'a', 'b']
And I want dictionary output like this:
{'a': 2, 'b': 1, 'c': 0}
I already made it:
#b = list #1
#words = list #2
c = {}
for i in b:
c.update({i:words.count(i)})
But it is very slow, I need to process like 10MB txt file.
EDIT: Entire code, currently testing so unused imports..
import string
import os
import operator
import time
from collections import Counter
def getbookwords():
a = open("wu.txt", encoding="utf-8")
b = a.read().replace("\n", "").lower()
a.close()
b.translate(string.punctuation)
b = b.split(" ")
return b
def wordlist(words):
a = open("wordlist.txt")
b = a.read().lower()
b = b.split("\n")
a.close()
t = time.time()
#c = dict((i, words.count(i)) for i in b )
c = Counter(words)
result = {k: v for k, v in c.items() if k in set(b)}
print(time.time() - t)
sorted_d = sorted(c.items(), key=operator.itemgetter(1))
return(sorted_d)
print(wordlist(getbookwords()))
Since speed is currently an issue, it might be worth considering not passing through the list for each thing you want to count. The set() function allows you to only use the unique keys in your list words.
An important thing to remember for speed in all cases is the line unique_words = set(b). Without this, an entire pass through your list is being done to create a set from b at every iteration in whichever kind of data structure you happen to use.
c = {k:0 for k in set(words)}
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
Alternatively, defaultdicts can be used to eliminate some of the initialization.
from collections import defaultdict
c = defaultdict(int)
for w in words:
c[w] += 1
unique_words = set(b)
c = {k:counts[k] for k in c if k in unique_words}
For completeness sake, I do like the Counter based solutions in the other answers (like from Reut Sharabani). The code is cleaner, and though I haven't benchmarked it I wouldn't be surprised if a built-in counting class is faster than home-rolled solutions with dictionaries.
from collections import Counter
c = Counter(words)
unique_words = set(b)
c = {k:v for k, v in c.items() if k in unique_words}
Try using collections.Counter and move b to a set, not a list:
from collections import Counter
c = Counter(words)
b = set(b)
result = {k: v for k, v in c.items() if k in b}
Also, if you can read the words lazily and not create an intermediate list that should be faster.
Counter provides the functionality you want (counting items), and filtering the result against a set uses hashing which should be a lot faster.
You can use collection.Counter on a generator that skips ignored keys using a set lookup.
from collections import Counter
keys = ['a', 'b', 'c']
lst = ['a', 'd', 'a', 'b']
unique_keys = set(keys)
count = Counter(x for x in lst if x in unique_keys)
print(count) # Counter({'a': 2, 'b': 1})
# count['c'] == 0
Note that count['c'] is not printed, but is still 0 by default in a Counter.
Here's an example I just coughed up in repl. Assuming you're not counting duplicates in list two. We create a hash table using a dictionary. For each item in the list were matching two, we create a key value pair with the item being the key and we set the value to 0.
Next we iterate through the second list, for each value, we check if the value has been defined already, if it has been, than we increment the value using the key. Else, we ignore.
Least amount of iterations possible. You hit each item in each list only once.
x = [1, 2, 3, 4, 5];
z = [1, 2, 2, 2, 1];
y = {};
for n in x:
y[n] = 0; //Set the value to zero for each item in the list
for n in z:
if(n in y): //If we defined the value in the hash already, increment by one
y[n] += 1;
print(y)
#Makalone, above answers are appreciable. You can also try the below code sample which uses Python's Counter() from collections module.
You can try it at http://rextester.com/OTYG56015.
Python code »
from collections import Counter
list1 = ['a', 'b', 'c']
list2 = ['a', 'd', 'a', 'b']
counter = Counter(list2)
d = {key: counter[key] for key in set(list1)}
print(d)
Output »
{'a': 2, 'c': 0, 'b': 1}

How to get a set of keys with largest values?

I am working on a function
def common_words(dictionary, N):
if len(dictionary) > N:
max(dictionary, key=dictionary.get)
Description of the function is:
The first parameter is the dictionary of word counts and the second is
a positive integer N. This function should update the dictionary so
that it includes the most common (highest frequency words). At most N
words should be included in the dictionary. If including all words
with some word count would result in a dictionary with more than N
words, then none of the words with that word count should be included.
(i.e., in the case of a tie for the N+1st most common word, omit all
of the words in the tie.)
So I know that I need to get the N items with the highest values but I am not sure how to do that. I also know that once I get N items that if there are any duplicate values that I need to pop them out.
For example, given
k = {'a':5, 'b':4, 'c':4, 'd':1}
then
common_words(k, 2)
should modify k so that it becomes {'a':5}.
Here's my algorithm for this problem.
Extract the data from the dictionary into a list and sort it in descending order on the dictionary values.
Clear the original dictionary.
Group the sorted data into groups that have the same value.
Re-populate the dictionary with the all (key, value) pairs from each group in the sorted list if that will keep the total dictionary size <= N. If adding a group would make the total dictionary size > N, then return.
The grouping operation can be easily done using the standard itertools.groupby function.
To perform the sorting and grouping we need an appropriate key function, as described in the groupby, list and sorted docs. Since we need the second item of each tuple we could use
def keyfunc(t):
return t[1]
or
keyfunc = lambda t: t[1]
but it's more efficient to use operator.itemgetter.
from operator import itemgetter
from itertools import groupby
def common_words(d, n):
keyfunc = itemgetter(1)
lst = sorted(d.items(), key=keyfunc, reverse=True)
d.clear()
for _, g in groupby(lst, key=keyfunc):
g = list(g)
if len(d) + len(g) <= n:
d.update(g)
else:
break
# test
data = {'a':5, 'b':4, 'c':4, 'd':1}
common_words(data, 4)
print(data)
common_words(data, 2)
print(data)
output
{'c': 4, 'd': 1, 'b': 4, 'a': 5}
{'a': 5}
my algorithm as below
1st build tuple list from dictionary sorted based on value from
largest to smallest
check for if item[N-1] match item[N] value, if yes, drop item[N-1]
(index start from 0, so -1 there)
finally, convert the slice of tuple list up to N element back to
dict, may change to use OrderedDict here if wanna retain the items order
it will just return the dictionary as it is if the dictionary length is less than N
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
return dict(tmp[:N])
# return [i[0] for i in tmp[:N]] # comment line above and uncomment this line to get keys only as your title mention how to get keys
else:
return dictionary
# return dictionary.keys() # comment line above and uncomment this line to get keys only as your title mention how to get keys
>>> common_words({'a':5, 'b':4, 'c':4, 'd':1}, 2)
{'a': 5}
OP wanna modify input dictionary within function and return None, it can be modified as below
def common_words(dictionary, N):
if len(dictionary) > N:
tmp = [(k,dictionary[k]) for k in sorted(dictionary, key=dictionary.get, reverse=True)]
if tmp[N-1][1] == tmp[N][1]:
N -= 1
# return dict(tmp[:N])
for i in tmp[N:]:
dictionary.pop(i[0])
>>> k = {'a':5, 'b':4, 'c':4, 'd':1}
>>> common_words(k, 2)
>>> k
{'a': 5}

How to add dictionary keys with defined values to a list

I'm trying to only add keys with a value >= n to my list, however I can't give the key an argument.
n = 2
dict = {'a': 1, 'b': 2, 'c': 3}
for i in dict:
if dict[i] >= n:
list(dict.keys([i])
When I try this, it tells me I can't give .keys() an argument. But if I remove the argument, all keys are added, regardless of value
Any help?
You don't need to call .keys() method of dict as you are already iterating data_dict's keys using for loop.
n = 2
data_dict = {'a': 1, 'b': 2, 'c': 3}
lst = []
for i in data_dict:
if data_dict[i] >= n:
lst.append(i)
print lst
Results:
['c', 'b']
You can also achieve this using list comprehension
result = [k for k, v in data_dict.iteritems() if v >= 2]
print result
You should read this: Iterating over Dictionaries.
Try using filter:
filtered_keys = filter(lambda x: d[x] >= n, d.keys())
Or using list comprehension:
filtered_keys = [x for x in d.keys() if d[x] >= n]
The error in your code is that dict.keys returns all keys, as the docs mention:
Return a copy of the dictionary’s list of keys.
What you want is one key at a time, which list comprehension gives you. Also, when filtering, which is basically what you do, consider using the appropriate method (filter).

Categories

Resources