Comparing multiple dictionaries in Python - python

I'm new to Python and am running to a problem I can't google my way out of. I've built a GUI using wxPython and ObjectiveListView. In its very center, the GUI has a list control displaying data in X rows (the data is loaded by the user) and in five columns.
When the user selects multiple entries from the list control (pressing CTRL or shift while clicking), the ObjectiveListView module gives me a list of dictionaries, the dictionaries containing the data in the rows of the list control. This is exactly what I want, good!
The returned list looks something like this:
print MyList
[{'id':1023, 'type':'Purchase', 'date':'23.8.2008', 'sum':'-21,90', 'target':'Apple Store'}, {'id':1024, 'type':'Purchase', 'date':'24.8.2008', 'sum':'-21,90', 'target':'Apple Store'}, {'id':23, 'type':'Purchase', 'date':'2.8.2008', 'sum':'-21,90', 'target':'Apple Store'}]
All the dictionaries have the same keys, but the values change. The 'id' value is unique. Here the problems start. I want to get the common values for all the items the user selected. In the above list they would be 'sum':'-21,90' and 'target':'Apple Store'.
I don't know how to properly compare the dicts in the list. One big problem is that I don't know beforehand how many dicts the list contains, since it's decided by the user.
I have a vague idea that list comprehensions would be the way to go, but I only know how to compare two lists with list comprehensions, not n lists. Any help would be appreciated.

>>> mysets = (set(x.items()) for x in MyList)
>>> reduce(lambda a,b: a.intersection(b), mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
First, I've created a generator that will convert the list of dicts into an iterable sequence of sets of key,value pairs. You could use a list comprehension here but this way doesn't convert your entire list into yet another list, useful if you don't know how big it will be.
Then I've used reduce to apply a function that finds the common values between each set. It finds the intersection of set 1 & set 2, which is itself a set, then the intersection of that set & set 3 etc. The mysets generator will happily feed each set on demand to the reduce function until its done.
I believe reduce has been deprecated as a built-in in Python 3.0, but should still be available in functools.
You could of course make it a one-liner by replacing mysets in the reduce with the generator expression, but that reduces the readability IMO. In practice I'd probably even go one step further and break the lambda out into its own line as well:
>>> mysets = (set(x.items()) for x in MyList)
>>> find_common = lambda a,b: a.intersection(b)
>>> reduce(find_common, mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
And if you need the end result to be a dict, just wrap it like so:
>>> dict(reduce(find_common, mysets))
{'sum': '-21,90', 'type': 'Purchase', 'target': 'Apple Store'}
dict can accept any iterator of key,value pairs, such as the set of tuples returned at the end.

My answer is identical to Matthew Trevor's, except for one difference:
>>> mysets = (set(x.items()) for x in MyList)
>>> reduce(set.intersection, mysets)
set([('sum', '-21,90'), ('type', 'Purchase'), ('target', 'Apple Store')])
Here I use set.intersection instead of creating a new lambda. In my opinion this is more readable, as this intuitively reads as "reduce is reducing this list using the set intersection operator." This should also be much faster, as set.intersection is a built-in C function.
To fully answer your question, you can extract the values using a list comprehension:
>>> mysets = (set(x.items()) for x in MyList)
>>> result = reduce(set.intersection, mysets)
>>> values = [r[1] for r in result]
>>> values
['-21,90', 'Purchase', 'Apple Store']
This would end up on one line for me. but that's entirely up to you:
>>> [r[1] for r in reduce(set.intersection, (set(x.items()) for x in myList))]
['-21,90', 'Purchase', 'Apple Store']

First, we need a function to compute intersection of two dictionaries:
def IntersectDicts( d1, d2 ) :
return dict(filter(lambda (k,v) : k in d2 and d2[k] == v, d1.items()))
Then we can use it to process any number of dictionaries:
result = reduce(IntersectDicts, MyList)

Since you're only looking for the common set, you can compare the keys in the first dictionary to the keys in all other dictionaries:
common = {}
for k in MyList[0]:
for i in xrange(1,len(MyList)):
if MyList[0][k] != MyList[i][k]: continue
common[k] = MyList[0][k]
>>> common
{'sum': '-21,90', 'type': 'Purchase', 'target': 'Apple Store'}

Sorry, yes, 'type':'Purchase' is also one of the common values.Should have logged in to edit the question.

Related

How to sort a list containing frozensets (python)

I have a list of frozensets that I'd like to sort, Each of the frozensets contains a single integer value that results from an intersection operation between two sets:
k = frozenset(w) & frozenset(string.digits)
d[k] = w # w is the value
list(d) # sorted(d) doesn't work since the keys are sets and sets are unordered.
Here is the printed list:
[frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
How can I sort the list using the values contained in the sets?
You need to provide function as key to sorted which would accept frozenset as argument and return something which might be compared. If each frozenset has exactly 1 element and said element is always single digit then you might use max function (it will extract that single element, as sole element is always biggest element of frozenset) that is
d1 = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
d2 = sorted(d1,key=max)
print(d2)
output
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'4'})]
If you want to know more read Sorting HOW TO
Previous answers can not sorted correctly, Because of strings
d = [frozenset({'224'}), frozenset({'346'}), frozenset({'2'}), frozenset({'22345'})]
sorted(d, key=lambda x: int(list(x)[0]))
Output:
[frozenset({'2'}),
frozenset({'224'}),
frozenset({'346'}),
frozenset({'22345'})]
Honestly, unless you really need to keep the elements as frozenset, the best might be to generate a list of values upstream ([2, 1, 4, 3]).
Anyway, to be able to sort the frozensets you need to make them ordered elements, for instance by converting to tuple. You can do this transparently using the key parameter of sorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'4'}), frozenset({'3'})]
sorted(l, key=tuple)
or natsorted for strings with multiple digits:
from natsort import natsorted
l = [frozenset({'2'}), frozenset({'1'}), frozenset({'14'}), frozenset({'3'})]
natsorted(l, key=tuple)
output:
[frozenset({'1'}), frozenset({'2'}), frozenset({'3'}), frozenset({'14'})]

A list of a lists into a single string

Suppose I have a list of lists say
A = [[1,1,1,1],[2,2,2,2]]
and I want to create two strings from that to be
'1111'
'2222'
How would we do this in python?
Maybe list comprehension:
>>> A = [[1,1,1,1],[2,2,2,2]]
>>> l=[''.join(map(str,i)) for i in A]
>>> l
['1111', '2222']
>>>
Now you've got it.
This is pretty easily done using join and a list comprehension.
A = [[1,1,1,1],[2,2,2,2]]
a_strings = [''.join(map(str, sub_list)) for sublist in A]
See, join() takes a list of strings and makes a string concatenating all the substrings and the list comprehension I used just loops through them all. Above I combined the 2 together.
On a second thought
map() is actually deemed more efficient (when not using lambda.. etc) and for SOME more readable. I'll just add an approach using map instead of a comprehension.
a_strings = map(''.join(), map(str, A))
This first takes the inner map and makes all the ints > strs then joins all the strs together for every sub-list.
Hopefully this makes things a bit more chewable for ya, each method is close to equivalent such that for this case you could consider them style choices.

Get related dictionaries from lists

I have two list of different dictionaries (ListA and ListB).
All dictionaries in listA have field "id" and "external_id"
All dictionaries in listB have field "num" and "external_num"
I need to get all pairs of dictionaries where value of external_id = num and value of external_num = id.
I can achieve that using this code:
for dictA in ListA:
for dictB in ListB:
if dictA["id"] == dictB["external_num"] and dictA["external_id"] == dictB["num"]:
But I saw many beautiful python expressions, and I guess it is possible to get that result more pythonic style, isn't it?
I something like:
res = [A, B for A, B in listA, listB if A['id'] == B['extnum'] and A['ext'] == B['num']]
You are pretty close, but you aren't telling Python how you want to connect the two lists to get the pairs of dictionaries A and B.
If you want to compare all dictionaries in ListA to all in ListB, you need itertools.product:
from itertools import product
res = [A, B for A, B in product(ListA, ListB) if ...]
Alternatively, if you want pairs at the same indices, use zip:
res = [A, B for A, B in zip(ListA, ListB) if ...]
If you don't need the whole list building at once, note that you can use itertools.ifilter to pick the pairs you want:
from itertools import ifilter, product
for A, B in ifilter(lambda (A, B): ...,
product(ListA, ListB)):
# do whatever you want with A and B
(if you do this with zip, use itertools.izip instead to maximise performance).
Notes on Python 3.x:
zip and filter no longer return lists, therefore itertools.izip and itertools.ifilter no longer exist (just as range has pushed out xrange) and you only need product from itertools; and
lambda (A, B): is no longer valid syntax; you will need to write the filtering function to take a single tuple argument lambda t: and e.g. replace A with t[0].
Firstly, for code clarity, I actually would probably go with your first option - I don't think using for loops is particularly un-Pythonic, in this case. However, if you want to try using a list comprehension, there are a few things to be aware of:
Each item returned by the list comprehension needs to be just a singular item. Trying to return A, B is going to give you a SyntaxError. However, you can return either a list or a tuple (or anything else, that is a single object), so something like res = [(A,B) for...] would start working.
Another concern is how you're iterating over these lists - from you first snippet of code, it appears you don't make any assumptions about these lists lining up, meaning: you seem to be ok if the 2nd item in listA matches the 14th item in listB, so long as they match on the appropriate fields. That's perfectly reasonable, but just be aware that means you will need two for loops no matter how you try to do it*. And you still need your comparisons. So, as a list comprehension, you might try:
res = [(A, B) for A in listA for B in listB if A['id']==B['extnum'] and A['extid']==B['num']]
Then, in res, you'll have 0 or more tuples, and each tuple will contain the respective dictionaries you're interested in. To use them:
for tup in res:
A = tup[0]
B = tup[1]
#....
or more concisely (and Pythonically):
for A,B in res:
#...
since Python is smart enough to know that it's yielding an item (the tuple) that has 2 elements, and so it can directly assign them to A and B.
EDIT:* in retrospect, it isn't completely true that you need two forloops, and if your lists are big enough, it may be helpful, performance-wise, to make an intermediate dictionary such as this:
# make a dictionary with key=tuple, value=dictionary
interim = {(A['id'], A['extid']): A for A in listA}
for B in listB:
tup = (B['extnum'], B['num']) ## order matters! match-up with A
if tup in interim:
A = interim[tup]
print(A, B)
and, if the id-extid pair isnot expected to be unique across all items in listA, then you'd want to look into collections.defaultdict with a list... but I'm not sure this still fits in the 'more Pythonic' category anymore.
I realize this is likely overkill for the question you asked, but I couldn't let my 'two for loops' statement stand, since it's not entirely true.

python: union keys from multiple dictionary?

I have 5 dictionaries and I want a union of their keys.
alldict = [dict1, dict2, dict3, dict4, dict5]
I tried
allkey = reduce(lambda x, y: set(x.keys()).union(y.keys()), alldict)
but it gave me an error
AttributeError: 'set' object has no attribute 'keys'
Am I doing it wrong ? I using normal forloop but I wonder why the above code didn't work.
I think #chuck already answered the question why it doesn't work, but a simpler way to do this would be to remember that the union method can take multiple arguments:
allkey = set().union(*alldict)
does what you want without any loops or lambdas.
Your solution works for the first two elements in the list, but then dict1 and dict2 got reduced into a set and that set is put into your lambda as the x. So now x does not have the method keys() anymore.
The solution is to make x be a set from the very beginning by initializing the reduction with an empty set (which happens to be the neutral element of the union).
Try it with an initializer:
allkey = reduce(lambda x, y: x.union(y.keys()), alldict, set())
An alternative without any lambdas would be:
allkey = reduce(set.union, map(set, map(dict.keys, alldict)))
A simple strategy for non-functional neurons (pun intended):
allkey = []
for dictio in alldict:
for key in dictio:
allkey.append(key)
allkey = set(allkey)
We can convert this code to a much sorter form using set comprehensions:
allkey = {key for dictio in alldict for key in dictio}
This one-liner is still very readable in comparison with the conventional for loop.
The key to convert a nested loop to a list or set comprehension is to write the inner loop (the one that varies faster in the nested loop) as the last index (that is, for key in dictio).
set().union(dict1.keys(),dict2.keys()...)
I tried the list and it didnt work so just putting it up here for anyone.
Just one more way, 'cause what the hay:
a={}; [ a.update(b) for b in alldict ] and a.keys()
or the slightly-more-mysterious
reduce(lambda a, b: a.update(b) or a, alldict, {}).keys()
(I'm bummed that there's no built-in function equivalent to
def f(a,b):
r = {}
r.update(a)
r.update(b)
return r
is there?)
If you only want to union keys of 2 dicts you could use operator |.
Quote from docs:
Return a new set with elements from the set and all others.
Example:
all_keys = (dict1.keys() | dict2.keys())

difference between 2 pieces Python code

I'm doing an exercise as following:
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
sample solution:
def front_x(words):
listX = []
listO = []
for w in words:
if w.startswith('x'):
listX.append(w)
else:
listO.append(w)
listX.sort()
listO.sort()
return listX + listO
my solution:
def front_x(words):
listX = []
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
listX.sort()
words.sort()
return listX + words
as I tested my solution, the result is a little weird. Here is the source code with my solution: http://dl.dropbox.com/u/559353/list1.py. You might want to try it out.
The problem is that you loop over the list and remove elements from it (modifying it):
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w)
Example:
>>> a = range(5)
>>> for i in a:
... a.remove(i)
...
>>> a
[1, 3]
This code works as follows:
Get first element, remove it.
Move to the next element. But it is not 1 anymore because we removed 0 previously and thus 1 become the new first element. The next element is therefore 2 and 1 is skipped.
Same for 3 and 4.
Two main differences:
Removing an element from a list inside loop where the list is being iterated doesn't quite work in Python. If you were using Java you would get an exception saying that you are modifying a collection that is being iterated. Python doesn't shout this error apparently. #Felix_Kling explains it quite well in his answer.
Also you are modifying the input parameter words. So the caller of your function front_x will see words modified after the execution of the function. This behaviour, unless is explicitly expected, is better to be avoided. Imagine that your program is doing something else with words. Keeping two lists as in the sample solution is a better approach.
Altering the list you're iterating over results in undefined behaviour. That's why the sample solution creates two new lists instead of deleting from the source list.
for w in words:
if w.startswith('x'):
listX.append(w)
words.remove(w) # Problem here!
See this question for a discussion on this matter. It basically boils down to list iterators iterating through the indexes of the list without going back and checking for modifications (which would be expensive!).
If you want to avoid creating a second list, you will have to perform two iterations. One to iterate over words to create listX and another to iterate over listX deleting from words.
That hint is misleading and unnecessary, you can do this without sorting and combining two lists independently:
>>> items = ['mix', 'xyz', 'apple', 'xanadu', 'aardvark']
>>> sorted(items, key=lambda item: (item[0]!='x', item))
['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
The built-in sorted() function takes an option key argument that tells it what to sort by. In this case, you want to create a tuples like (False, 'xanadu') or (True, 'apple') for each element of the original list, which you can do with a lambda.

Categories

Resources