Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {'word' : number}. For example, the word "secondly" appears one time, so the key/value pair would be {'secondly' : 1}. I want to make an inverted list so that the numbers will become keys and the words will become the values for those keys so I can then graph the top 25 most used words. I saw somewhere where the setdefault() function might come in handy, but regardless I cannot use it because so far in the class I am in we have only covered get().
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)
inverted_dictionary
So far, using this method above, it works fine until it reaches another word with the same value. For example, the word "saves" also appears once in the document, so Python will add the new key/value pair just fine. BUT it erases the {1 : 'secondly'} with the new pair so that only {1 : 'saves'} is in the dictionary.
So, bottom line, my goal is to get ALL of the words and their respective number of repetitions in this new dictionary called inverted_dictionary.
A defaultdict is perfect for this
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict
d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
d[value].append(key)
print(d)
Output:
defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})
What you can do is convert the value in a list of words with the same key:
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(str(key))
else:
inverted_dictionary[new_key] = [str(key)]
print inverted_dictionary
>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}
Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(key)
else:
inverted_dictionary[new_key] = [key]
In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:
common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
if len(common_words) < 25:
common_words.extend(inverted_dictionary[key])
else:
break
common_words = common_words[:25] # In case there are more than 25 words
Here's a version that doesn't "invert" the dictionary:
>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]
Instead, it creates a list that is sorted, highest to lowest, by value.
To get the top 25, you simply slice it: B[:25].
And here's one way to get the keys and values separated (after putting them into a list of tuples):
>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]
or
>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)
Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.
For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.
Either put the items in a sorted list (example assumes you want to get to two most frequent words):
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())
Result:
>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]
Or use Python's included batteries (heapq.nlargest in this case):
import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))
Result:
[('third', 3), ('second', 2)]
Related
If I have a word like "hello", I want the program to generate a dictionary with the keys being the number of occurrences of the letters and values being a list of the letters.
So "hello" would generate {1: ['h', 'e', 'o'], 2: ["l"]}.
from collections import defaultdict, Counter
def occurrences(s):
h = defaultdict(list)
for k, v in Counter(s).items():
h[v].append(k)
return h
occurrences("hello")
Output
defaultdict(<class 'list'>, {1: ['h', 'e', 'o'], 2: ['l']})
A Counter is a dictionary that is automatically initialized to zero: with c = Counter() you can do c[key] += 1 even if key isn't already in c. An additional benefit is that if you pass a list-like object, it builds at once a dictionary with counts. A string is interpreted as a list of characters.
Thus, Counter("hello") is the dictionary Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
It's this dictionary you are trying to "reverse".
Now, you just need to create a dictionary of lists, and to append the letters, where the key is the value in the preceding Counter.
There is another dictionary class, more or less like Counter: defaultdict. It allows to decide what will be the initial value. For instance, a defaultdict(list) has initial value [] (or equivalently, list()). So with h = defaultdict(list), you can do h[1].append("e") even if 1 is not already a key of h.
Note that both Counter and defaultdict are a subclass of dict.
See also the documentation of the collections module.
There are several ways to do this. You need to create a dictionary with count as key and list as values. So you can use defaultdict.
from collections import Counter, defaultdict
inverted = defaultdict(list)
for k, v in Counter(s).items():
inverted[v].append(k)
return inverted
The code block creates a special dictionary. In this dictionary, if you want to access a key never defined, the value will be an empty list. So you can append any values without initiating it. Counter helps you to count every characters in given string. For "hello", the output of Counter(s).items() is dict_items([('h', 1), ('e', 1), ('l', 2), ('o', 1)]) so we need to change key, value pair as you asked. For more information: collection library
I have unlimited list of elements(they are rows in document db) containing dicts like:
list = [
{a:''}
{a:'', b:''}
{a:'',b:'',c:''}
]
And I have input - element, unlimited in count of it's dicts, like:
{a:'', c:''}
I need a function to find element index matching most dict keys with input.
In this case it would be list[2], because it contains both {a:''} and {c:''}
Could you help me/prompt me how to do it?
You can use the builtin max function and provide a matching key:
# The input to search over
l = [{'a':''}, {'a':'', 'b':''}, {'a':'','b':'','c':''}]
# Extract the keys we'd like to search for
t = set({'a': '', 'c': ''}.keys())
# Find the item in l that shares maximum number of keys with the requested item
match = max(l, key=lambda item: len(t & set(item.keys())))
To extract the index in one pass:
max_index = max(enumerate(l), key=lambda item: len(t & set(item[1].keys())))[0]
>>> lst = [{'a':'a'},{'a':'a','b':'b'},{'a':'a','b':'b','c':'c'}]
>>> seen = {}
>>> [seen.update({key:value}) for dct in lst for key,value in dict(dct).items() if key not in seen.keys()]
>>> seen
Output
{'a': 'a', 'c': 'c', 'b': 'b'}
check here
I have this list made from a csv which is massive.
For every item in list, I have broken it into it's id and details. id is always between 0-3 characters max length and details is variable.
I created an empty dictionary, D...(rest of code below):
D={}
for v in list:
id = v[0:3]
details = v[3:]
if id not in D:
D[id] = {}
if details not in D[id]:
D[id][details] = 0
D[id][details] += 1
aside: Can you help me understand what the two if statements are doing? Very new to python and programming.
Anyway, it produces something like this:
{'KEY1_1': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
'KEY1_2': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
and many more KEY1's with variable numbers of key2's
Each 'KEY1' is unique but each 'key2' isn't necessarily. The value2_
s are all different.
Ok so, right now I found a way to sort by the first KEY
for k, v in sorted(D.items()):
print k, ':', v
I have done enough research to know that dictionaries can't really be sorted but I don't care about sorting, I care about ordering or more specifically frequencies of occurrence. In my code value2_x is the number of times its corresponding key2_x occurs for that particular KEY1_x. I am starting to think I should have used better variable names.
Question: How do I order the top-level/overall dictionary by the number in value2_x which is in the nested dictionary? I want to do some statistics to those numbers like...
How many times does the most frequent KEY1_x:key2_x pair show up?
What are the 10, 20, 30 most frequent KEY1_x:key2_x pairs?
Can I only do that by each KEY1 or can I do it overall? Bonus: If I could order it that way for presentation/sharing that would be very helpful because it is such a large data set. So much thanks in advance and I hope I've made my question and intent clear.
You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:
from collections import Counter
d = {
'KEY1': {
'key2_1': 5,
'key2_2': 1,
'key2_3': 3
},
'KEY2': {
'key2_1': 2,
'key2_2': 3,
'key2_3': 4
}
}
c = Counter()
for k, v in d.iteritems():
c.update({(k, k1): v1 for k1, v1 in v.iteritems()})
print c.most_common(3)
Output:
[(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]
If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:
from collections import Counter
l = ['foobar', 'foofoo', 'foobar', 'barfoo']
D = Counter((v[:3], v[3:]) for v in l)
print D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]
Short explanation: ((v[:3], v[3:]) for v in l) is a generator expression that will generate tuples where first item is the same as top level key in your original dict and second item is the same as key in nested dict.
>>> x = list((v[:3], v[3:]) for v in l)
>>> x
[('foo', 'bar'), ('foo', 'foo'), ('foo', 'bar'), ('bar', 'foo')]
Counter is a subclass of dict. It accepts an iterable as an argument and each unique element in iterable will be used as key and value is the count of element in the iterable.
>>> c = Counter(x)
>>> c
Counter({('foo', 'bar'): 2, ('foo', 'foo'): 1, ('bar', 'foo'): 1})
Since generator expression is an iterable there's no need to convert it to list in between so construction can simply be done with Counter((v[:3], v[3:]) for v in l).
The if statements you asked about are checking if the key exists in dict:
>>> d = {1: 'foo'}
>>> 1 in d
True
>>> 2 in d
False
So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.
if id not in D:
D[id] = {}
The second if does exactly the same for nested dictionaries.
I have a nested list like:
list1 = [(A,0.75),(D,0.49),(Y,0.36)]
I have a reference nested list like:
mainlist = [(A,10),(B,20),(C,30),(D,40),(E,50).........,(Y,250),(Z,260)]
I want to search for key element A in mainlist.
Once A is found in mainlist, store corresponding key,value pair in a new nested list.
Repeat 1 and 2 for D and Y.(all elements in list1)
I want output as:
newlist = [(A,1),(D,4),(Y,250)]
Are you looking for something like this:
list1 = [('A',0.75),('D',0.49),('Y',0.36)]
mainlist = [('A',10),('B',20),('C',30),('D',40),('E',50),('Y',250),('Z',260)]
keys = {k[0] for k in list1} # create a set with keys from list1
newlist = [k for k in mainlist if k[0] in keys] # get items from mainlist with good keys
print(newlist)
Output:
[('A', 10), ('D', 40), ('Y', 250)]
Please note, that the set #Sait used in his (quite beautiful) solution is an unordered data structure of unique items:
unordered: meaning you can not safely assume that the order of the keys will still be 'A', 'D', 'Y' in the set.
unique: meaning that if you have repeated keys, like 'A', 'D', 'Y', 'A', only 'A', 'D', 'Y' would show up in your output, as each item can only appear once in a set.
That said, you could save yourself some trouble by using a dictionary for your lookups instead of a list of tuples.
>>> list1 = [('A',0.75),('D',0.49),('Y',0.36),('D',0.49)]
>>> maindict = dict({'A':10, 'B':20, 'C':30, 'D':40, 'E':50, 'Y':250, 'Z':260})
>>> keys, values = zip(*list1) # unzip the tuples in list one into two separate lists
>>> newlist = [(key, maindict[key]) for key in keys]
>>> print(newlist)
[('A', 10), ('D', 40), ('Y', 250), ('D', 40)]
This solution is guaranteed to preserve the order of your input and can handle repeated keys as well. By using a dictionary do not need to iterate over the whole mainlist to find your keys. You can iterate over your keys and get their values by a single lookup of the dictionary (which is quite fast) per key.
Given a dictionary keyed by 2-element tuples, I want to return all the key-value pairs whose keys contain a given element.
For example, the dictionary can be:
tupled_dict = {('a',1):1, ('a',2):0, ('b',1):1, ('c',4):0}
and the given element is 'a', then the key-value pairs that should be returned would be:
('a',1):1, ('a',2):0
What is the fastest code to do this?
EDIT:
In addition, as a related sub-question, I am interested in the fastest way to delete all such key-value pairs given an element of the keys. Obviously, once I have the results of the above, I can use a loop to delete each dictionary item one by one, but I wonder if there is a short-cut way to do it.
To get those ones:
>>> {k: v for k, v in tupled_dict.iteritems() if 'a' in k}
{('a', 1): 1, ('a', 2): 0}
Similarly, to delete the other ones:
>>> tupled_dict = {k: v for k, v in tupled_dict.iteritems() if 'a' not in k}
>>> tupled_dict
{('b', 1): 1, ('c', 4): 0}
I haven't tested it for performance, but I suggest you start by getting a baseline using a for loop, and then another with dict comprehensions .
>>> {k:v for k, v in tupled_dict.iteritems() if k[0] == 'a'}
{('a', 1): 1, ('a', 2): 0}
This snippet will work even if 'a' isn't the first element in a key tuple:
from operator import methodcaller
contains_a = methodcaller('__contains__', 'a')
keys = filter(contains_a, tupled_dict)
new_dict = dict(zip(keys, map(tupled_dict.get, keys))