If I have a word like "hello", I want the program to generate a dictionary with the keys being the number of occurrences of the letters and values being a list of the letters.
So "hello" would generate {1: ['h', 'e', 'o'], 2: ["l"]}.
from collections import defaultdict, Counter
def occurrences(s):
h = defaultdict(list)
for k, v in Counter(s).items():
h[v].append(k)
return h
occurrences("hello")
Output
defaultdict(<class 'list'>, {1: ['h', 'e', 'o'], 2: ['l']})
A Counter is a dictionary that is automatically initialized to zero: with c = Counter() you can do c[key] += 1 even if key isn't already in c. An additional benefit is that if you pass a list-like object, it builds at once a dictionary with counts. A string is interpreted as a list of characters.
Thus, Counter("hello") is the dictionary Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
It's this dictionary you are trying to "reverse".
Now, you just need to create a dictionary of lists, and to append the letters, where the key is the value in the preceding Counter.
There is another dictionary class, more or less like Counter: defaultdict. It allows to decide what will be the initial value. For instance, a defaultdict(list) has initial value [] (or equivalently, list()). So with h = defaultdict(list), you can do h[1].append("e") even if 1 is not already a key of h.
Note that both Counter and defaultdict are a subclass of dict.
See also the documentation of the collections module.
There are several ways to do this. You need to create a dictionary with count as key and list as values. So you can use defaultdict.
from collections import Counter, defaultdict
inverted = defaultdict(list)
for k, v in Counter(s).items():
inverted[v].append(k)
return inverted
The code block creates a special dictionary. In this dictionary, if you want to access a key never defined, the value will be an empty list. So you can append any values without initiating it. Counter helps you to count every characters in given string. For "hello", the output of Counter(s).items() is dict_items([('h', 1), ('e', 1), ('l', 2), ('o', 1)]) so we need to change key, value pair as you asked. For more information: collection library
Related
I'm using Aiohttp's implementation of multidict().
Take this:
>>> d = MultiDict[('a', 1), ('b', 2), ('a', 3)])
>>> d
<MultiDict {'a': 1, 'b': 2, 'a': 3}>
I want to convert d to a regular dictionary where duplicate key values are appended into a list such as this:
{'a': [1, 3], 'b': 2}
Is there an elegant way to convert this? Other than a loop through items and a lot of logical conditions?
It doesnt look like multidicts have an inbuilt function for a straight conversion, but you can use the .keys() function to iterate through the multidict and copy the values into a fresh dictionary.
new_dict = {}
for k in set(multi_dict.keys()):
new_dict[k] = multi_dict.getall(k)
Two interesting things here - we need to make a set of the multidict keys function call to remove duplicates, and multidicts have a .getall() function that returns a list of all values associated with duplicate keys.
EDIT for single value cases:
new_dict = {}
for k in set(multi_dict.keys()):
k_values = multi_dict.getall(k)
if len(k_values) > 1:
new_dict[k] = k_values
else:
new_dict[k] = k_values[0]
I have 2 lists which correspond to what I would like to be my key:value pairs, for example:
list_1 = [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2] #(key)
list_2 = [x,x,x,y,g,r,t,w,r,r,r,t,f,c,d] #(value)
I've (kind of) been able to create a dictionary via: dict = dict(zip(list_1, [list_2]))
However the problem with this is that it is only picking up '1' as a key and also results in duplicate entries within the list of values for the key.
Can anyone suggest a way to create a dictionary so that only the unique values from list_2 are mapped to their corresponding key?
Thanks
EDIT:
output I'm looking for would be one dictionary keyed by 1 and 2 with lists as values containing only the unique values for each i.e.:
dict = {1: [x,y,g,r,t,w], 2: [r,t,f,c,d]}
This sort of problem is properly solved with a collections.defaultdict(set); the defaultdict gives you easy auto-vivificaction of sets for each key on demand, and the set uniquifies the values associated with each key:
from collections import defaultdict
mydict = defaultdict(set)
for k, v in zip(list_1, list_2):
mydict[k].add(v)
You can then convert the result to a plain dict with list values with:
mydict = {k: list(v) for k, v in mydict.items()}
If order of the values must be preserved, on modern Python you can use dicts instead of set (on older Python, you'd use collections.OrderedDict):
mydict = defaultdict(dict)
for k, v in zip(list_1, list_2):
mydict[k][v] = True # Dummy value; we're using a dict to get an ordered set of the keys
with the conversion to plain dict with list values being unchanged
If the input is already sorted, itertools.groupby is theoretically slightly more efficient (it's actual O(n), vs. average case O(n) using dicts), but in practice the defaultdict is typically as faster or faster (the implementation of groupby has some unavoidable inefficiencies). Just for illustration, the groupby solution would be:
from itertools import groupby
from operator import itemgetter
mydict = {k: {v for _, v in grp} for k, grp in groupby(zip(list_1, list_2), key=itemgetter(0))]
# Or preserving order of insertion:
getval = itemgetter(1) # Construct once to avoid repeated construction
mydict = {k: list(dict.fromkeys(map(getval, grp)))
for k, grp in groupby(zip(list_1, list_2), key=itemgetter(0))]
Since a dictionary is a set it cant contain twice the same key but it can have the key once then a list of value for that you can use the one-line method
my_dict = {key:[list_2[i] for i in range(len(list_2)) if list_1[i]==key] for key in set(list_1)}
Or a more classic method
my_dict = {}
for key_id in range(len(list_1)):
if list_1[key_id] not in my_dict:
my_dict[list_1[key_id]] = []
my_dict[list_1[key_id]].append(list_2[key_id])
In both case the result is
my_dict = {1: ['x', 'x', 'x', 'y', 'g', 'r', 't', 'w', 'r'], 2: ['r', 'r', 't', 'f', 'c', 'd']}
The problem is your key is too unique. there're only two unique keys 1 and 2. So if you're creating dictionaries you can't have {1:x, 1:y} at same time for example, unless you change the key to something new and unique.
I would use a tuple in your purpose:
list(set(tuple(zip(list_1, list_2))))
The set gives you unique mappings which is what dropping the duplicates.
keys = [1,1,1,1,1,1,1,1,1,2,2,2,2,2,2]
values = ['x','x','x','y','g','r','t','w','r','r','r','t','f','c','d']
result = {}
for key,value in zip(keys,values):
if key not in result:
result[key] = []
if value not in result[key]:
result[key].append(value)
else:
if value not in result[key]:
result[key].append(value)
print(result)
{1: ['x', 'y', 'g', 'r', 't', 'w'], 2: ['r', 't', 'f', 'c', 'd']}
Note:
zip(keys,values) this will create a iterable of tuples, each tuple consist of one element from the keys and values.
(1,'x')
(1,'x')
I would like to set up an empty dictionary ind_dict at the beginning, and then when I get a pair of (key, val). If the key already exists in the ind_dict, I will just add up the val(float type), otherwise I will add the new (key, val) pair.
That is a use case for defaultdict:
from collections import defaultdict
ind_dict = defaultdict(float)
for key, val in [('a', 1), ('b', 2), ('a', 3)]:
ind_dict[key] += val
Now:
>>> ind_dict
defaultdict(float, {'a': 4.0, 'b': 2.0})
defaultdict(default_factory[, ...]) --> dict with default factory
The default factory is called without arguments to produce
a new value when a key is not present, in getitem only.
A defaultdict compares equal to a dict with the same items.
All remaining arguments are treated the same as if they were
passed to the dict constructor, including keyword arguments.
Here the default factory is float. Calling float() gives you a 0.0.
So you can directly add to it without any if statements.
def addFloatToFloatDic(tup,dic):
if tup[0] in dic: # check if in dict, if so
dic[tup[0]] += tup[1] # add value
else: # else
dic[tup[0]] = tup[1] # use as initial value
ind_dic = {}
addFloatToFloatDic(("k",12.1),ind_dic)
print (ind_dic)
addFloatToFloatDic(("k",12.1),ind_dic)
print (ind_dic)
Output:
{'k': 12.1}
{'k': 24.2}
Say I have a dictionary called word_counter_dictionary that counts how many words are in the document in the form {'word' : number}. For example, the word "secondly" appears one time, so the key/value pair would be {'secondly' : 1}. I want to make an inverted list so that the numbers will become keys and the words will become the values for those keys so I can then graph the top 25 most used words. I saw somewhere where the setdefault() function might come in handy, but regardless I cannot use it because so far in the class I am in we have only covered get().
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
inverted_dictionary[new_key] = word_counter_dictionary.get(new_key, '') + str(key)
inverted_dictionary
So far, using this method above, it works fine until it reaches another word with the same value. For example, the word "saves" also appears once in the document, so Python will add the new key/value pair just fine. BUT it erases the {1 : 'secondly'} with the new pair so that only {1 : 'saves'} is in the dictionary.
So, bottom line, my goal is to get ALL of the words and their respective number of repetitions in this new dictionary called inverted_dictionary.
A defaultdict is perfect for this
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
from collections import defaultdict
d = defaultdict(list)
for key, value in word_counter_dictionary.iteritems():
d[value].append(key)
print(d)
Output:
defaultdict(<type 'list'>, {1: ['first'], 2: ['second', 'fourth'], 3: ['third']})
What you can do is convert the value in a list of words with the same key:
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(str(key))
else:
inverted_dictionary[new_key] = [str(key)]
print inverted_dictionary
>>> {1: ['first'], 2: ['second', 'fourth'], 3: ['third']}
Python dicts do NOT allow repeated keys, so you can't use a simple dictionary to store multiple elements with the same key (1 in your case). For your example, I'd rather have a list as the value of your inverted dictionary, and store in that list the words that share the number of appearances, like:
inverted_dictionary = {}
for key in word_counter_dictionary:
new_key = word_counter_dictionary[key]
if new_key in inverted_dictionary:
inverted_dictionary[new_key].append(key)
else:
inverted_dictionary[new_key] = [key]
In order to get the 25 most repeated words, you should iterate through the (sorted) keys in the inverted_dictionary and store the words:
common_words = []
for key in sorted(inverted_dictionary.keys(), reverse=True):
if len(common_words) < 25:
common_words.extend(inverted_dictionary[key])
else:
break
common_words = common_words[:25] # In case there are more than 25 words
Here's a version that doesn't "invert" the dictionary:
>>> import operator
>>> A = {'a':10, 'b':843, 'c': 39, 'd': 10}
>>> B = sorted(A.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> B
[('b', 843), ('c', 39), ('a', 10), ('d', 10)]
Instead, it creates a list that is sorted, highest to lowest, by value.
To get the top 25, you simply slice it: B[:25].
And here's one way to get the keys and values separated (after putting them into a list of tuples):
>>> [x[0] for x in B]
['b', 'c', 'a', 'd']
>>> [x[1] for x in B]
[843, 39, 10, 10]
or
>>> C, D = zip(*B)
>>> C
('b', 'c', 'a', 'd')
>>> D
(843, 39, 10, 10)
Note that if you only want to extract the keys or the values (and not both) you should have done so earlier. This is just examples of how to handle the tuple list.
For getting the largest elements of some dataset an inverted dictionary might not be the best data structure.
Either put the items in a sorted list (example assumes you want to get to two most frequent words):
word_counter_dictionary = {'first':1, 'second':2, 'third':3, 'fourth':2}
counter_word_list = sorted((count, word) for word, count in word_counter_dictionary.items())
Result:
>>> print(counter_word_list[-2:])
[(2, 'second'), (3, 'third')]
Or use Python's included batteries (heapq.nlargest in this case):
import heapq, operator
print(heapq.nlargest(2, word_counter_dictionary.items(), key=operator.itemgetter(1)))
Result:
[('third', 3), ('second', 2)]
I'm trying to find the most efficient way in python to create a dictionary of 'guids' (point ids in rhino) and retrieve them depending on the value(s) I assign them, change that value(s) and restoring them back in the dictionary. One catch is that with Rhinoceros3d program the points have a random generated ID number which I don't know so I can only call them depending on the value I give them.
are dictionaries the correct way? should the guids be the value instead of the keys?
a very basic example :
arrPts=[]
arrPts = rs.GetPoints() # ---> creates a list of point-ids
ptsDict = {}
for ind, pt in enumerate(arrPts):
ptsDict[pt] = ('A'+str(ind))
for i in ptsDict.values():
if '1' in i :
print ptsDict.keys()
how can I make the above code print the key that has the value '1' , instead of all the keys? and then change the key's value from 1 to e.g. 2 ?
any help also on the general question would be appreciated to know I'm in the right direction.
Thanks
Pav
You can use dict.items().
An example:
In [1]: dic={'a':1,'b':5,'c':1,'d':3,'e':1}
In [2]: for x,y in dic.items():
...: if y==1:
...: print x
...: dic[x]=2
...:
a
c
e
In [3]: dic
Out[3]: {'a': 2, 'b': 5, 'c': 2, 'd': 3, 'e': 2}
dict.items() returns a list of tuples containing keys and value pairs in python 2.x:
In [4]: dic.items()
Out[4]: [('a', 2), ('c', 2), ('b', 5), ('e', 2), ('d', 3)]
and in python 3.x it returns an iterable view instead of list.
I think you want the GUID's to be values, not keys, since it looks like you want to look them up by something you assign. ...but it really depends on your use case.
# list of GUID's / Rhinoceros3d point ids
arrPts = ['D20EA4E1-3957-11d2-A40B-0C5020524153',
'1D2680C9-0E2A-469d-B787-065558BC7D43',
'ED7BA470-8E54-465E-825C-99712043E01C']
# reference each of these by a unique key
ptsDict = dict((i, value) for i, value in enumerate(arrPts))
# now `ptsDict` looks like: {0:'D20EA4E1-3957-11d2-A40B-0C5020524153', ...}
print(ptsDict[1]) # easy to "find" the one you want to print
# basically make both keys: `2`, and `1` point to the same guid
# Note: we've just "lost" the previous guid that the `2` key was pointing to
ptsDict[2] = ptsDict[1]
Edit:
If you were to use a tuple as the key to your dict, it would look something like:
ptsDict = {(loc, dist, attr3, attr4): 'D20EA4E1-3957-11d2-A40B-0C5020524153',
(loc2, dist2, attr3, attr4): '1D2680C9-0E2A-469d-B787-065558BC7D43',
...
}
As you know, tuples are immutable, so you can't change the key to your dict, but you can remove one key and insert another:
oldval = ptsDict.pop((loc2, dist2, attr3, attr4)) # remove old key and get value
ptsDict[(locx, disty, attr3, attr4)] = oldval # insert it back in with a new key
In order to have one key point to multiple values, you'd have to use a list or set to contain the guids:
{(loc, dist, attr3, attr4): ['D20E...', '1D2680...']}