Removing items of a certain index from a dictionary? - python

If I've got a dictionary and it's sorted, and I want to remove the first three items (in order of value) from it by index (no matter what the contents of the initial dictionary was), what do I do? How would I go about doing so?
I was hoping it would let me just slice (such as one does with lists), but I've been made aware that that's impossible.
EDIT: By index I mean indices. So for example, were I to remove the items from 1 to 3 of the sorted dictionary below, after it was sorted by value, then I would only be left with "eggs".
EDIT 2: How do I find the keys in those places then (in indices 0, 1, 2)?
EDIT 3: I'm not allowed to import or print in this.
For example:
>>>food = {"ham":12, "cookie":5, "eggs":16, "steak":2}
>>>remove_3(food)
{"eggs":16}

Get key value pairs (.items()), sort them by value (item[1]), and take the first 3 ([:3]):
for key, value in sorted(food.items(), key=lambda item: item[1])[:3]:
del food[key]

Try the following:
import operator
from collections import OrderedDict
food = {"ham": 12, "cookie": 5, "eggs": 16, "steak": 2}
ordered_dict = OrderedDict(sorted(food.items(), key=operator.itemgetter(1)))
for key in list(ordered_dict)[:3]:
del ordered_dict[key]
Output:
>>> ordered_dict
OrderedDict([('eggs', 16)])

Firstly, regarding your statement:
If I've got a dictionary and it's sorted
dict in Python are not ordered in nature. Hence you can not preserve the order. If you want to create a dict with the sorted order, use collections.OrderedDict(). For example:
>>> from collections import OrderedDict
>>> from operator import itemgetter
>>> food = {"ham":12, "cookie":5, "eggs":16, "steak":2}
>>> my_ordered_dict = OrderedDict(sorted(food.items(), key=itemgetter(1)))
The value hold by my_ordered_dict will be:
>>> my_ordered_dict
OrderedDict([('steak', 2), ('cookie', 5), ('ham', 12), ('eggs', 16)])
which is equivalent to dict preserving the order as:
{
'steak': 2,
'cookie': 5,
'ham': 12,
'eggs': 16
}
In order to convert the dict excluding items with top 3 value, you have to slice the items (dict.items() returns list of tuples in the form (key, value)):
>>> dict(my_ordered_dict.items()[3:]) # OR, OrderedDict(my_ordered_dict.items()[3:])
{'eggs': 16} # for maintaining the order

Related

Convert a list with duplicating keys into a dictionary and sum the values for each duplicating key

I am new to Python so I do apologize that my first question might not be asked clearly to achieve the right answer.
I thought if I converted a list with duplicating keys into a dictionary then I would be able to sum the values of each duplicating key. I have tried to search on Google and Stack Overflow but I actually still can't solve this problem.
Can anybody help, please? Thank you very much in advance and I truly appreciate your help.
list1 = ["a:2", "b:5", "c:7", "a:8", "b:12"]
My expected output is:
dict = {a: 10, b: 17, c: 7}
You can try this code:
list1 = ["a:2", "b:5", "c:7", "a:8", "b:12"]
l1 = [each.split(":") for each in list1]
d1 = {}
for each in l1:
if each[0] not in d1:
d1[each[0]] = int(each[1])
else:
d1[each[0]] += int(each[1])
d1
Output: {'a': 10, 'b': 17, 'c': 7}
Explanation:
Step 1. Convert your given list to key-value pair by splitting each of the elements in your original list from : and store that in a list/tuple
Step 2. Initialize an empty dictionary
Step 3. Iterate through each key-value pair in the newly created list/tuple and store that in a dictionary. If the key doesn't exist, then add new key-value pair to dictionary or else just add the values to it's corresponding key.
A list does not have "keys" per say, rather it has elements. In your example, the elements them selves are a key value pair. To make the dictionary you want you have to do 3 things,
Parse each element into its key value pair
Handle duplicate values
Add each pair to the dictionary.
the code should look like this
list1 = ["a:2", "b:5", "c:7", "a:8", "b:12"]
dict1={}#make an empty dictionary
for element in list1:
key,value=element.split(':')#This splits your list elements into a tuple of (key,value)
if key in dict1:#check if the key is in the dictionary
dict1[key]+=int(value)#add to existing key
else:
dict1[key]=int(value)#initilize new key
print(dict1)
That code prints out
{'a': 10, 'c': 7, 'b': 17}
You could use a defaultdict, iterate over each string and add the corresponding value after splitting it to a pair (key, value).
>>> from collections import defaultdict
>>> res = defaultdict(int)
>>> for el in list1:
... k, v = el.split(':')
... res[k]+=int(v)
...
>>> res
defaultdict(<class 'int'>, {'a': 10, 'b': 17, 'c': 7})

Ordering a nested dictionary by the frequency of the nested value

I have this list made from a csv which is massive.
For every item in list, I have broken it into it's id and details. id is always between 0-3 characters max length and details is variable.
I created an empty dictionary, D...(rest of code below):
D={}
for v in list:
id = v[0:3]
details = v[3:]
if id not in D:
D[id] = {}
if details not in D[id]:
D[id][details] = 0
D[id][details] += 1
aside: Can you help me understand what the two if statements are doing? Very new to python and programming.
Anyway, it produces something like this:
{'KEY1_1': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
'KEY1_2': {'key2_1' : value2_1, 'key2_2' : value2_2, 'key2_3' : value2_3},
and many more KEY1's with variable numbers of key2's
Each 'KEY1' is unique but each 'key2' isn't necessarily. The value2_
s are all different.
Ok so, right now I found a way to sort by the first KEY
for k, v in sorted(D.items()):
print k, ':', v
I have done enough research to know that dictionaries can't really be sorted but I don't care about sorting, I care about ordering or more specifically frequencies of occurrence. In my code value2_x is the number of times its corresponding key2_x occurs for that particular KEY1_x. I am starting to think I should have used better variable names.
Question: How do I order the top-level/overall dictionary by the number in value2_x which is in the nested dictionary? I want to do some statistics to those numbers like...
How many times does the most frequent KEY1_x:key2_x pair show up?
What are the 10, 20, 30 most frequent KEY1_x:key2_x pairs?
Can I only do that by each KEY1 or can I do it overall? Bonus: If I could order it that way for presentation/sharing that would be very helpful because it is such a large data set. So much thanks in advance and I hope I've made my question and intent clear.
You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:
from collections import Counter
d = {
'KEY1': {
'key2_1': 5,
'key2_2': 1,
'key2_3': 3
},
'KEY2': {
'key2_1': 2,
'key2_2': 3,
'key2_3': 4
}
}
c = Counter()
for k, v in d.iteritems():
c.update({(k, k1): v1 for k1, v1 in v.iteritems()})
print c.most_common(3)
Output:
[(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]
If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:
from collections import Counter
l = ['foobar', 'foofoo', 'foobar', 'barfoo']
D = Counter((v[:3], v[3:]) for v in l)
print D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]
Short explanation: ((v[:3], v[3:]) for v in l) is a generator expression that will generate tuples where first item is the same as top level key in your original dict and second item is the same as key in nested dict.
>>> x = list((v[:3], v[3:]) for v in l)
>>> x
[('foo', 'bar'), ('foo', 'foo'), ('foo', 'bar'), ('bar', 'foo')]
Counter is a subclass of dict. It accepts an iterable as an argument and each unique element in iterable will be used as key and value is the count of element in the iterable.
>>> c = Counter(x)
>>> c
Counter({('foo', 'bar'): 2, ('foo', 'foo'): 1, ('bar', 'foo'): 1})
Since generator expression is an iterable there's no need to convert it to list in between so construction can simply be done with Counter((v[:3], v[3:]) for v in l).
The if statements you asked about are checking if the key exists in dict:
>>> d = {1: 'foo'}
>>> 1 in d
True
>>> 2 in d
False
So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.
if id not in D:
D[id] = {}
The second if does exactly the same for nested dictionaries.

Find least frequent value in dictionary

I'm working on a problem that asks me to return the least frequent value in a dictionary and I can't seem to work it out besides with a few different counts, but there aren't a set number of values in the dictionaries being provided in the checks.
For example, suppose the dictionary contains mappings from students' names (strings) to their ages (integers). Your method would return the least frequently occurring age. Consider a dictionary variable d containing the following key/value pairs:
{'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
Three people are age 20 (Jeff, Kasey, and Kim), two people are age 22 (Alyssa and Stef), and four people are age 25 (Char, Dan, Mogran, and Ryan). So rarest(d) returns 22 because only two people are that age.
Would anyone mind pointing me in the right direction please? Thanks!
Counting the members of a collection is the job of collections.Counter:
d={'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
import collections
print collections.Counter(d.values()).most_common()[-1][0]
22
You can create an empty dict for the counters, then loop through the dict you've got and add 1 to the corresponding value in the second dict, then return the key of the element with the minimum value in the second dict.
from collections import Counter
min(Counter(my_dict_of_ages.values()).items(),key=lambda x:x[1])
would do it i think
You can use collections.Counter
d={'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
import collections
print collections.Counter(d.values()).most_common()[-1][0]
Or write your own function:
def rarest(dict):
values = dict.values()
least_frequent = max(values)
for x in set(values):
if values.count(x) < least_frequent:
least_frequent = x
return {least_frequent:dict[least_frequent]}
>>> rarest({'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22})
{22:2}
You could create a second dictionary that uses the values in the first (ages) as keys in the second, with the values of the second as counts. Then sort the values of the second and do a reverse loop-up to get the associated keys (there are a few ways to do this efficiently by treating the the list of keys and the list of values as numpy arrays).
import numpy
d = {'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
def rarest(d):
s = {}
# First, map ages to counts.
for key in d:
if d[key] not in s:
s[d[key]] = 1
else:
s[d[key]] += 1 # Could use a defaultdict for this.
# Second, sort on the counts to find the rarest.
keys = numpy.array(s.keys())
values = numpy.array(s.values())
ordering = np.argsort(values)
return keys[ordering][0]
There's probably a more efficient way to do this, but that seems to work.
my_dict = {'Alyssa':22, 'Char':25, 'Dan':25, 'Jeff':20, 'Kasey':20, 'Kim':20, 'Mogran':25, 'Ryan':25, 'Stef':22}
values = my_dict.values()
most_frequent = 0
for x in set(values):
if values.count(x) > most_frequent:
most_frequent = x
print most_frequent
This code uses the set() method, which returns a set with all unique elements, i.e.:
>> set([1, 2, 3, 4, 2, 1])
set([1, 2, 3, 4])
To extract all the values from the dict, you can use dict.values(). Likewise, you have dict.keys() and dict.items().
>> my_dict.keys()
['Char', 'Stef', 'Kim', 'Jeff', 'Kasey', 'Dan', 'Mogran', 'Alyssa', 'Ryan']
>> my_dict.values()
[25, 22, 20, 20, 20, 25, 25, 22, 25]
>> my_dict.items()
[('Char', 25),
('Stef', 22),
('Kim', 20),
('Jeff', 20),
('Kasey', 20),
('Dan', 25),
('Mogran', 25),
('Alyssa', 22),
('Ryan', 25)]
In case anyone else prefers to remember as few function/property names and packages as possible, JadedTuna's answer is good. Here's my go-to:
val_count = {}
for k in d:
if k in val_count.keys():
val_count[k] += 1
else:
val_count[k] = 1
val_count = list(val_count.items()) # Convert dict to [(k1, v1), (k2, v2), ...]
val_count.sort(key=lambda tup: tup[1]) # Sorts by count. Add reverse=True if you'd like mode instead
val_count[0]

Python: OrderedDictionary sorting based on length of key's value

I have an object like this:
t = {'rand_key_1': ['x'], 'rand_key_2': [13,23], 'rand_key_3': [(1)], 'rk5': [1,100,3,4,3,3]}
a dictionary with random keys (string and/or int) which ALL have a list as a value, with varying sizes.
I want to turn this dictionary into an OrderedDict which is ordered depending on the Length of the list of the dictionary items. So after ordering I want to get:
t_ordered = {'rk5': ..., 'rand_key_2': .., 'rand_key_1': .., 'rand_key_3': ..}
(if two or more items have same value, their order do not really matter.
I tried this but I am failing:
OrderedDict(sorted(d, key=lambda t: len(t[1])))
I am not experiences so excuse me if what I try is uber stupid.
What can I do?
Thank you.
You were actually very close with the sorting function you passed to sorted. The thing to note is that sorted will return an interable of the dictionaries keys in order. So if we fix your function to index the dictionary with each key:
>>> sorted(t, key=lambda k: len(t[k]))
['rand_key_3', 'rand_key_1', 'rand_key_2', 'rk5']
You can also specify that the keys are returned in reverse order and iterating directly over these keys:
>>> for sorted_key in sorted(t, key=lambda k: len(t[k]), reverse=True):
... print sorted_key, t[sorted_key]
rk5 [1, 100, 3, 4, 3, 3]
rand_key_2 [13, 23]
rand_key_3 [1]
rand_key_1 ['x']
Usually you wouldn't need to create an OrderedDict, as you would just iterate over a new sorted list using the latest dictionary data.
Using simple dictionary sorting first and then using OrderedDict():
>>> from collections import OrderedDict as od
>>> k=sorted(t, key=lambda x:len(t[x]), reverse=True)
>>> k
['rk5', 'rand_key_2', 'rand_key_3', 'rand_key_1']
>>> od((x, t[x]) for x in k)
OrderedDict([('rk5', [1, 100, 3, 4, 3, 3]), ('rand_key_2', [13, 23]), ('rand_key_3', [1]), ('rand_key_1', ['x'])])
Since an ordered dictionary remembers its insertion order, so you can do this:
OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))
OrderedDict in Python is a collection that remembers the order in which items were inserted. Ordered in this context does not mean sorted.
If all you need is to get all the items in sorted order you can do something like this:
for key, value in sorted(t, key = lambda x: -len(x[0])):
# do something with key and value
However, you are still using an unsorted data structure - just iterating over it in sorted order. This still does not support operations like looking up the k-th element, or the successor or predecessor of an element in the dict.

Search and sort through dictionary in Python

I need to sort and search through a dictionary. I know that dictionary cannot be sorted. But all I need to do search through it in a sorted format. The dictionary itself is not needed to be sorted.
There are 2 values. A string, which is a key and associated with the key is an integer value. I need to get a sorted representation based on the integer. I can get that with OrderedDict.
But instead of the whole dictionary I need to print just the top 50 values. And I need to extract some of the keys using RegEx. Say all the keys starting with 'a' and of 5 length.
On a side note can someone tell me how to print in a good format in python? Like:
{'secondly': 2,
'pardon': 6,
'saves': 1,
'knelt': 1}
insdead of a single line. Thank you for your time.
If you want to sort the dictionary based on the integer value you can do the following.
d = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}
a = sorted(d.iteritems(), key=lambda x:x[1], reverse=True)
The a will contain a list of tuples:
[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]
Which you can limit to a top 50 by using a[:50] and then search through the keys, with youre search pattern.
There are a bunch of ways to get a sorted dict, sorted and iteritems()are your friends.
data = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}
The pattern I use most is:
key = sorted(data.iteritems())
print key #[('knelt', 1), ('pardon', 6), ('saves', 1), ('secondly', 2)]
key_desc = sorted(data.iteritems(), reverse=True)
print key_desc #[('secondly', 2), ('saves', 1), ('pardon', 6), ('knelt', 1)]
To sort on the value and not the key you need to override sorted's key function.
value = sorted(data.iteritems(), key=lambda x:x[1])
print value #[('saves', 1), ('knelt', 1), ('secondly', 2), ('pardon', 6)]
value_desc = sorted(data.iteritems(),key=lambda x:x[1], reverse=True)
print value_desc #[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]
For nice formatting check out the pprint module.
If I'm understanding correctly, an OrderedDict isn't really what you want. OrderedDicts remember the order in which keys were added; they don't track the values. You could get what you want using generators to transform the initial data:
import re, operator
thedict = {'secondly':2, 'pardon':6, ....}
pat = re.compile('^a....$') # or whatever
top50 = sorted(((k,v) for (k,v) in thedict.iteritems() if pat.match(k)), reverse=True, key=operator.itemgetter(1))[:50]
As you're using OrderedDict already, you can probably do what you need with a list comprehension. Something like:
[ value for value in d.values()[:50] if re.match('regex', value) ]
Please post your current code if you need something more specific.
For the multi-line pretty print, use pprint with the optional width parameter if needed:
In [1]: import pprint
In [2]: d = {'a': 'a', 'b': 'b' }
In [4]: pprint.pprint(d)
{'a': 'a', 'b': 'b'}
In [6]: pprint.pprint(d,width=20)
{'a': 'a',
'b': 'b'}
There are a few different tools that can help you:
The sorted function takes an iterable and iterates through the elements in order. So you could say something like for key, value in d.iteritems().
The filter function takes an iterable and a function, and returns only those elements for which the function evaluates to True. So, for instance, filter(lambda x: your_condition(x), d.iteritems()) would give you a list of key-value tuples, which you could then sort through as above. (In Python 3, filter returns an iterator, which is even better.)
Generator expressions let you combine all of the above into one. For instance, if you only care about the values, you could write (value for key, value in sorted(d.iteritems()) if condition), which would return an iterator.
you could sort though they keys of the dicionary :
dict = {'secondly': 2,
'pardon': 6,
'saves': 1,
'knelt': 1}
for key in sorted(dict.keys()):
print dict[key]
This will sort your output based on the keys.(in your case the string values alphabetically)

Categories

Resources