Adding Multiple Values to Single Key in Python Dictionary Comprehension - python

I wanted to learn how to use dictionary comprehension and decided to use one for the previously solved task. I need to assign multiple values to the same key. I was wondering if there's a better way to achieve what I'm trying to do than with the code I've written so far.
graph = {(x1,y1): [(c,d) for a,b,c,d in data if a == x1 and b == y1] for x1 ,y1, x2, y2 in data}
For example I have this:
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
The first two values should create a key and the remaining two should be added as a value of a key.
With the given example I would like to return:
{(1, 2): [(1, 5), (7, 2)], (1, 5): [(4, 7)], (4, 7): [(7, 5)]}
Is there an easier way to do it than iterate through the entire data just to find the matching values?

Using this dict comprehension isn’t an efficient way here. It loops over the same input data repeatedly.
It's more Pythonic to just use a simple for loop, iterating the data only once:
from collections import defaultdict
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
output = defaultdict(list)
for a, b, c, d in data:
output[a, b].append((c, d))

Your code is neat but the time complexity is O(n^2), which can be reduced to O(n).
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
result = dict()
for item in data:
key = (item[0],item[1])
value = result.setdefault(key,[])
value.append((item[2],item[3]))
result[key] = value
print result
In my opinion, using a for loop can make codes more comprehensive

I don't know if it is the best answer but I would do something like that :
m_dict = {}
for val in data:
key = (val[0],val[1])
if key in m_dict:
m_dict[key].append((val[2],val[3]))
else:
m_dict[key] = [(val[2],val[3])]
Or more concisely using setdefault:
m_dict = {}
for val in data:
key = (val[0],val[1])
obj = m_dict.setdefault(key,[])
obj.append((val[2],val[3]))

In this instance, I would use itertools.groupby. For your example:
dict(groupby(data, lambda t: (t[0], t[1])))
This will produce a dict with the keys equal to (1, 2), (1, 5), and (4, 7) and the values consisting of (1, 2, 1, 5), (1, 2, 7, 2)... which should be sufficient for most uses. You can also post-process the grouped list, if need be.
As noted in the comments below, groupby requires sorted data. As such, you will need to sort before grouping and will probably want to cast the iterator to a list:
first_two = lambda tup: (tup[0], tup[1])
groups = groupby(sorted(data, key=first_two), first_two)
target = {k: list(g) for k, g in groups}

Related

Python - Get key of specific tuple index minimum in dictionary of tuples

I have a dict of tuples such as:
d = {'a': (3, 5), 'b': (5, 8), 'c': (9, 3)}
I want to return the key of the minimum of the tuple values based on the tuple index. For example, if using tuple index = 0, then 'a' would be returned. if index = 1, then 'c' would be returned. I have tried using min(), for example
min(d, key=d.get)
but am not sure how to manipulate it to select the tuple index to use. Although there are similar questions, I have not found an answer to this. Apologies in advance if this is a duped question, and please link to the answer. Thanks
You can write a lambda function to get the elements from the value by their index:
min(d, key=lambda k: d[k][0])
# 'a'
min(d, key=lambda k: d[k][1])
# 'c'
Since multiple keys could have the same value, you might want to return a list of matching keys, not just a single key.
def min_keys(d, index):
# Initialize lists
values = []
matches = []
# Append tuple items to list based on index
for t in list(d.values()):
values.append(t[index])
# If the item matches the min, append the key to list
for key in d:
if d[key][index] == min(values):
matches.append(key)
# Return a list of all keys with min value at index
return matches
Dictionaries are unsorted and have no index.
If you want the return the key alphabetically first you could use the ascii order:
print(chr(min([ord(key) for key in d.keys()])))
Here's a portable method you can use for dicts with a structure like yours, and feel free to choose the index of interest in the tuple:
def extract_min_key_by_index(cache, index):
min_val = float('inf')
min_key = 0
for k, v in d.iteritems():
if v[index] < min_val:
min_key, min_val = k, v[index]
return min_key
d = {'a': (3, 5), 'b': (5, 8), 'c': (9, 3)}
INDEX = 0
print extract_min_key_by_index(d, INDEX)

comparing list of tuple elements python

I have a two list of tuples
t1 = [ ('a',3,4), ('b',3,4), ('c',4,5) ]
t2 = [ ('a',4,6), ('c',3,4), ('b',3,6), ('d',4,5) ]
Such that
the order of the tuples may not be the same order and
the lists may not contain the same amount of tuple elements.
My goal is to compare the two lists such that if the string element matches, then compare the last integer element in the tuple and return a list containing -1 if t1[2] < t2[2], 0 if they are equal and 1 if they are greater than.
I've tried different variations but the problem i have is finding a way to match the strings to do proper comparison.
return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) ]
Where diff_unique does the aforementioned comparison of the integers, and new_list is t1 and old_list is t2.
I've also tried this:
return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) if(x[0]==y[0]]
What I intend to do is use the returned list and create a new four-tuple list with the original t1 values along with the difference from the matching t2 tuple. i.e
inc_dec_list = compare_list(new,old)
final_list = [ (f,r,u,chge) for (f,r,u), chge in zip(new,inc_dec_list)]
Where new = t1 and old = t2. This may have been an important detail, sorry I missed it.
Any help in the right direction?
Edit: I have added my test case program that mimicks what my original intent is for those who want to help. Thank you all.
import os
import sys
old = [('a',10,1),('b',10,2),('c',100,4),('d',200,4),('f',45,2)]
new = [('a',10,2),('c',10,2),('b',100,2),('d',200,6),('e',233,4),('g',45,66)]
def diff_unique(a,b):
print "a:{} = b:{}".format(a,b)
if a < b:
return -1
elif a==b:
return 0
else:
return 1
def compare_list(new_list, old_list):
a = { t[0]:t[1:] for t in new_list }
b = { t[0]:t[1:] for t in old_list }
common = list( set(a.keys())&set(b.keys()))
return [diff_unique(a[key][1], b[key][1]) for key in common]
#get common tuples
#common = [x for x,y in zip(new_list,old_list) if x[0] == y[0] ]
#compare common to old list
#return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) ]
inc_dec_list = compare_list(new,old)
print inc_dec_list
final_list = [ (f,r,u,chge) for (f,r,u), chge in zip(new,inc_dec_list)]
print final_list
To match the tuples by string from different lists, you can use dict comprehension (order inside the tuples is preserved):
a = {t[0]:t[1:] for t in t1} # {'a': (3, 4), 'c': (4, 5), 'b': (3, 4)}
b = {t[0]:t[1:] for t in t1} # {'a': (4, 6), 'c': (3, 4), 'b': (3, 6), 'd': (4, 5)}
Then you can iterate over the keys of both dictionaries and do the comparison. Assuming you only want to do the comparison for keys/tuples present in t1 and t2, you can join the keys using sets:
common_keys = list(set(a.keys())&set(b.keys()))
And finally compare the dictionary's items and create the list you want like this:
return [diff_unique(a[key][1],b[key][1]) for key in common_keys ]
If you need the output in the order of the alphabetically sorted characters, use the sorted function on the keys:
return [diff_unique(a[key][1],b[key][1]) for key in sorted(common_keys) ]
If you want all keys to be considered, you can do the following:
all_keys = list(set(a.keys()+b.keys()))
l = list()
for key in sorted(all_keys):
try:
l.append(diff_unique(a[key][1],b[key][1]))
except KeyError:
l.append("whatever you want")
return l
With the new information about what values should be returned in what order, the solution would be this:
ordered_keys = [t[0] for t in t1]
a = {t[0]:t[1:] for t in t1} # {'a': (3, 4), 'c': (4, 5), 'b': (3, 4)}
b = {t[0]:t[1:] for t in t1} # {'a': (4, 6), 'c': (3, 4), 'b': (3, 6), 'd': (4, 5)}
l = list()
for key in sorted(ordered_keys):
try:
l.append(diff_unique(a[key][1],b[key][1]))
except KeyError:
l.append(0) # default value
return l
First, build a default dictionary from each list, with the default value for a nonexistent key being a tuple whose last element is the smallest possible value for a comparison.
SMALL = (-float['inf'],)
from collections import defaultdict
d1 = defaultdict(lambda: SMALL, [(t[0], t[1:]) for t in t1])
d2 = defaultdict(lambda: SMALL, [(t[0], t[1:]) for t in t2])
Next, iterate over the keys in each dictionary (which can be created easily with itertools.chain). You probably want to sort the keys for the resulting list to have any meaning (otherwise, how do you know which keys produced which of -1/0/1?)
from itertools import chain
all_keys = set(chain(d1, d2))
result = [cmp(d1[k][-1], d2[k][-1]) for k in sorted(all_keys)]
Here is a simple solution of your problem,
It is not one line as you tried. I hope it will still help you
for a in t1:
for b in t2:
if a[0] != b[0]:
continue
return cmp(a[-1], b[-1])
In python 3.x, you can compare two lists of tuples
a and b thus:
import operator
a = [(1,2),(3,4)]
b = [(3,4),(1,2)]
# convert both lists to sets before calling the eq function
print(operator.eq(set(a),set(b))) #True

Search and sort through dictionary in Python

I need to sort and search through a dictionary. I know that dictionary cannot be sorted. But all I need to do search through it in a sorted format. The dictionary itself is not needed to be sorted.
There are 2 values. A string, which is a key and associated with the key is an integer value. I need to get a sorted representation based on the integer. I can get that with OrderedDict.
But instead of the whole dictionary I need to print just the top 50 values. And I need to extract some of the keys using RegEx. Say all the keys starting with 'a' and of 5 length.
On a side note can someone tell me how to print in a good format in python? Like:
{'secondly': 2,
'pardon': 6,
'saves': 1,
'knelt': 1}
insdead of a single line. Thank you for your time.
If you want to sort the dictionary based on the integer value you can do the following.
d = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}
a = sorted(d.iteritems(), key=lambda x:x[1], reverse=True)
The a will contain a list of tuples:
[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]
Which you can limit to a top 50 by using a[:50] and then search through the keys, with youre search pattern.
There are a bunch of ways to get a sorted dict, sorted and iteritems()are your friends.
data = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}
The pattern I use most is:
key = sorted(data.iteritems())
print key #[('knelt', 1), ('pardon', 6), ('saves', 1), ('secondly', 2)]
key_desc = sorted(data.iteritems(), reverse=True)
print key_desc #[('secondly', 2), ('saves', 1), ('pardon', 6), ('knelt', 1)]
To sort on the value and not the key you need to override sorted's key function.
value = sorted(data.iteritems(), key=lambda x:x[1])
print value #[('saves', 1), ('knelt', 1), ('secondly', 2), ('pardon', 6)]
value_desc = sorted(data.iteritems(),key=lambda x:x[1], reverse=True)
print value_desc #[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]
For nice formatting check out the pprint module.
If I'm understanding correctly, an OrderedDict isn't really what you want. OrderedDicts remember the order in which keys were added; they don't track the values. You could get what you want using generators to transform the initial data:
import re, operator
thedict = {'secondly':2, 'pardon':6, ....}
pat = re.compile('^a....$') # or whatever
top50 = sorted(((k,v) for (k,v) in thedict.iteritems() if pat.match(k)), reverse=True, key=operator.itemgetter(1))[:50]
As you're using OrderedDict already, you can probably do what you need with a list comprehension. Something like:
[ value for value in d.values()[:50] if re.match('regex', value) ]
Please post your current code if you need something more specific.
For the multi-line pretty print, use pprint with the optional width parameter if needed:
In [1]: import pprint
In [2]: d = {'a': 'a', 'b': 'b' }
In [4]: pprint.pprint(d)
{'a': 'a', 'b': 'b'}
In [6]: pprint.pprint(d,width=20)
{'a': 'a',
'b': 'b'}
There are a few different tools that can help you:
The sorted function takes an iterable and iterates through the elements in order. So you could say something like for key, value in d.iteritems().
The filter function takes an iterable and a function, and returns only those elements for which the function evaluates to True. So, for instance, filter(lambda x: your_condition(x), d.iteritems()) would give you a list of key-value tuples, which you could then sort through as above. (In Python 3, filter returns an iterator, which is even better.)
Generator expressions let you combine all of the above into one. For instance, if you only care about the values, you could write (value for key, value in sorted(d.iteritems()) if condition), which would return an iterator.
you could sort though they keys of the dicionary :
dict = {'secondly': 2,
'pardon': 6,
'saves': 1,
'knelt': 1}
for key in sorted(dict.keys()):
print dict[key]
This will sort your output based on the keys.(in your case the string values alphabetically)

creating cumulative percentage from a dictionary of data

Given a dictionary (or Counter) of tally data like the following:
d={'dan':7, 'mike':2, 'john':3}
and a new dictionary "d_cump" that I want to contain cumulative percentages
d_cump={'mike':16, 'john':41, 'dan':100}
EDIT: Should clarify that order doesn't matter for my input set, which is why I'm using a dictionary or counter. Order does matter when calculating cumulative percentages so I need to sort the data for that operation, once I have the cumulative percentage for each name then I put it back in a dictionary since again, order shouldn't matter if I'm looking at single values.
What is the most elegant/pythonic way to get from d to d_cump?
Here is what I have seems a bit clumsy:
from numpy import cumsum
d={'dan':7, 'mike':2, 'john':3}
sorted_keys = sorted(d,key=lambda x: d[x])
perc = [x*100/sum(d.values()) for x in cumsum([ d[x] for x in sorted_keys ])]
d_cump=dict(zip(sorted_keys,perc))
>>> d_cump
{'mike': 16, 'john': 41, 'dan': 100}
It's hard to tell how a cumulative percentage would be valuable considering the order of the original dictionary is arbitrary.
That said, here's how I would do it:
from numpy import cumsum
from operator import itemgetter
d={'dan':7, 'mike':2, 'john':3}
#unzip keys from values in a sorted order
keys, values = zip(*sorted(d.items(), key=itemgetter(1)))
total = sum(values)
# calculate cumsum and zip with keys into new dict
d_cump = dict(zip(keys, (100*subtotal/total for subtotal in cumsum(values))))
Note that there is no special order to the results because dictionaries are not ordered:
{'dan': 100, 'john': 41, 'mike': 16}
Since you're using numpy anyway, you can bypass/simplify the list comprehensions:
>>> from numpy import cumsum
>>> d={'dan':7, 'mike':2, 'john':3}
>>> sorted_keys = sorted(d,key=d.get)
>>> z = cumsum(sorted(d.values())) # or z = cumsum([d[k] for k in sorted_keys])
>>> d2 = dict(zip(sorted_keys, 100.0*z/z[-1]))
>>> d2
{'mike': 16, 'john': 41, 'dan': 100}
but as noted elsewhere, it feels weird to be using a dictionary this way.
Calculating a cumulative value? Sounds like a fold to me!
d = {'dan':7, 'mike':2, 'john':3}
denominator = float(sum(d.viewvalues()))
data = ((k,(v/denominator)) for (k, v) in sorted(d.viewitems(), key = lambda (k,v):v))
import functional
f = lambda (k,v), l : [(k, v+l[0][1])]+l
functional.foldr(f, [(None,0)], [('a', 1), ('b', 2), ('c', 3)])
#=>[('a', 6), ('b', 5), ('c', 3), (None, 0)]
d_cump = { k:v for (k,v) in functional.foldr(f, [(None,0)], data) if k is not None }
Functional isn't a built-in package. You could also re-jig f to work with a right-fold, and hence the standard reduce if you wanted.
As you can see, this isn't much shorter, but it takes advantage of sequence destructuring to avoid splitting/zipping, and it uses a generator as the intermediate data, which avoids building a list.
If you want to further minimise object creation, you can use this alternative function which modifies the initial list passed in (but has to use a stupid trick to return the appropriate value, because list.append returns None).
uni = lambda x:x
ff = lambda (k,v), l : uni(l) if l.insert(0, (k, v+l[0][1])) is None else uni(l)
Incidentally, the left fold is very easy using ireduce (from this page http://www.ibm.com/developerworks/linux/library/l-cpyiter/index.html ), because it eliminates the list construction:
ff = lambda (l, ll), (k,v), : (k, v+ll)
g = ireduce(ff, data, (None, 0))
tuple(g)
#=>(('mike', 0.16666666666666666), ('john', 0.41666666666666663), ('dan', 1.0))
def ireduce(func, iterable, init=None):
if init is None:
iterable = iter(iterable)
curr = iterable.next()
else:
curr = init
for x in iterable:
curr = func(curr, x)
yield curr
This is attractive because the initial value is not included, and because generators are lazy, and so particularly suitable for chaining.
Note that ireduce above is equivalent to:
def ireduce(func, iterable, init=None):
from functional import scanl
if init is None: init = next(iterable)
sg = scanl(func, init, iterable)
next(sg)
return sg

Dictionary of dictionaries in Python?

From another function, I have tuples like this ('falseName', 'realName', positionOfMistake), eg. ('Milter', 'Miller', 4).
I need to write a function that make a dictionary like this:
D={realName:{falseName:[positionOfMistake], falseName:[positionOfMistake]...},
realName:{falseName:[positionOfMistake]...}...}
The function has to take a dictionary and a tuple like above, as arguments.
I was thinking something like this for a start:
def addToNameDictionary(d, tup):
dictionary={}
tup=previousFunction(string)
for element in tup:
if not dictionary.has_key(element[1]):
dictionary.append(element[1])
elif:
if ...
But it is not working and I am kind of stucked here.
If it is only to add a new tuple and you are sure that there are no collisions in the inner dictionary, you can do this:
def addNameToDictionary(d, tup):
if tup[0] not in d:
d[tup[0]] = {}
d[tup[0]][tup[1]] = [tup[2]]
Using collections.defaultdict is a big time-saver when you're building dicts and don't know beforehand which keys you're going to have.
Here it's used twice: for the resulting dict, and for each of the values in the dict.
import collections
def aggregate_names(errors):
result = collections.defaultdict(lambda: collections.defaultdict(list))
for real_name, false_name, location in errors:
result[real_name][false_name].append(location)
return result
Combining this with your code:
dictionary = aggregate_names(previousFunction(string))
Or to test:
EXAMPLES = [
('Fred', 'Frad', 123),
('Jim', 'Jam', 100),
('Fred', 'Frod', 200),
('Fred', 'Frad', 300)]
print aggregate_names(EXAMPLES)
dictionary's setdefault is a good way to update an existing dict entry if it's there, or create a new one if it's not all in one go:
Looping style:
# This is our sample data
data = [("Milter", "Miller", 4), ("Milter", "Miler", 4), ("Milter", "Malter", 2)]
# dictionary we want for the result
dictionary = {}
# loop that makes it work
for realName, falseName, position in data:
dictionary.setdefault(realName, {})[falseName] = position
dictionary now equals:
{'Milter': {'Malter': 2, 'Miler': 4, 'Miller': 4}}

Categories

Resources