In a list of list of dicts:
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
I need to retrieve the highest 'y' values from each of the list of dicts...so the resulting list would contain:
Z = [(4, 7), (3,13), (1,20)]
In A, the 'x' is the key of each dict while 'y' is the value of each dict.
Any ideas? Thank you.
max accept optional key parameter.
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Z = []
for a in A:
d = max(a, key=lambda d: d['y'])
Z.append((d['x'], d['y']))
print Z
UPDATE
suggested by – J.F. Sebastian:
from operator import itemgetter
Z = [itemgetter(*'xy')(max(lst, key=itemgetter('y'))) for lst in A]
I'd use itemgetter and max's key argument:
from operator import itemgetter
pair_getter = itemgetter('x', 'y')
[pair_getter(max(d, key=itemgetter('y'))) for d in A]
[max(((d['x'], d['y']) for d in l), key=lambda t: t[1]) for l in A]
The solution to your stated problem has been given, but I suggest changing your underlying data structure. Tuples are much faster for small elements such as a point. You may retain the clarity of a dictionary by using namedtuple if you so desire.
>>> from collections import namedtuple
>>> A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Making a Point namedtuple is simple
>>> Point = namedtuple('Point', 'x y')
This is what an instance looks like
>>> Point(x=1, y=0) # Point(1, 0) also works
Point(x=1, y=0)
A would then look like this
>>> A = [[Point(**y) for y in x] for x in A]
>>> A
[[Point(x=1, y=0), Point(x=2, y=3), Point(x=3, y=4), Point(x=4, y=7)],
[Point(x=1, y=0), Point(x=2, y=2), Point(x=3, y=13), Point(x=4, y=0)],
[Point(x=1, y=20), Point(x=2, y=4), Point(x=3, y=0), Point(x=4, y=8)]]
Now working like this is much easier:
>>> from operator import attrgetter
>>> [max(row, key=attrgetter('y')) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
To retain the speed advantages of tuples it's better to access by index:
>>> from operator import itemgetter
>>> [max(row, key=itemgetter(2)) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
result=[]
for item in a:
new = sorted(item, key=lambda k: k['y'],reverse=True)
result.append((new[0]['x'],new[0]['y']))
print(result)
Note-The is not the efficient way to do this but this is one of the ways to get the required result.
Related
Depending on your few on my approach this is either a question about using np.unique() on awkward1 arrays or a call for a better approach:
Let a and b be two awkward1 arrays of the same outer length (number of events) but different inner lengths. For example:
a = [[1, 2], [3] , [] , [4, 5, 6]]
b = [[7] , [3, 5], [6], [8, 9]]
Let f: (x, y) -> z be a function that acts on two numbers x and y and results in the number z. For example:
f(x, y):= y - x
The idea is to compare every element in a with every element in b via f for each event and filter out the matches of a and b pairs that survive some cut applied to f. For example:
f(x, y) < 4
My approach for this is:
a = ak.from_iter(a)
b = ak.from_iter(b)
c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
i = ak.argcartesian({'x':a, 'y':b})
#i= [[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}, {'x': 1, 'y': 0}, {'x': 1, 'y': 1}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
diff = c['y'] - c['x']
#diff= [[6, 5], [0, 2], [], [4, 5, 3, 4, 2, 3]]
cut = diff < 4
#cut= [[False, False], [True, True], [], [False, False, True, False, True, True]]
new = c[cut]
#new= [[], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
new_i = i[cut]
#new_i= [[], [{'x': 0, 'y': 0}, {'x': 0, 'y': 1}], [], [{'x': 1, 'y': 0}, {'x': 2, 'y': 0}, {'x': 2, 'y': 1}]]
It is possible that pairs with the same element from a but different elements from b survive the cut. (e.g. {'x': 3, 'y': 3} and {'x': 3, 'y': 5})
My goal is to group those pairs with the same element from a together and therefore reshape the new array into:
new = [[], [{'x': 3, 'y': [3, 5]}], [], [{'x': 5, 'y': 8}, {'x': 6, 'y': [8, 9]}]]
My only idea how to achieve this is to create a list of the indexes from a that are still present after the cut by using new_i:
i = new_i['x']
#i= [[], [0, 0], [], [1, 2, 2]]
However, I need a unique version of this list to make every index appear only once. This could be achieved with np.unique() in NumPy. But doesn't work in awkward1:
np.unique(i)
<__array_function__ internals> in unique(*args, **kwargs)
TypeError: no implementation found for 'numpy.unique' on types that implement __array_function__: [<class 'awkward1.highlevel.Array'>]
My question:
Is their a np.unique() equivalent in awkward1 and/or would you recommend a different approach to my problem?
Okay, I still don't know how to use np.unique() on my arrays, but I found a solution for my own problem:
In my previous approach I used the following code to pair up booth arrays.
c = ak.cartesian({'x':a, 'y':b})
#c= [[{'x': 1, 'y': 7}, {'x': 2, 'y': 7}], [{'x': 3, 'y': 3}, {'x': 3, 'y': 5}], [], [{'x': 4, 'y': 8}, {'x': 4, 'y': 9}, {'x': 5, 'y': 8}, {'x': 5, 'y': 9}, {'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]
However, with the nested = True parameter from ak.cartesian() I get a list grouped by the elements of a:
c = ak.cartesian({'x':a, 'y':b}, axis = 1, nested = True)
#c= [[[{'x': 1, 'y': 7}], [{'x': 2, 'y': 7}]], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[{'x': 4, 'y': 8}, {'x': 4, 'y': 9}], [{'x': 5, 'y': 8}, {'x': 5, 'y': 9}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]
After the cut I end up with:
new = c[cut]
#new= [[[], []], [[{'x': 3, 'y': 3}, {'x': 3, 'y': 5}]], [], [[], [{'x': 5, 'y': 8}], [{'x': 6, 'y': 8}, {'x': 6, 'y': 9}]]]
I extract the y values and reduce the most inner layer of the nested lists of new to only one element:
y = new['y']
#y= [[[], []], [[3, 5]], [], [[], [8], [8, 9]]]
new = ak.firsts(new, axis = 2)
#new= [[None, None], [{'x': 3, 'y': 3}], [], [None, {'x': 5, 'y': 8}, {'x': 6, 'y': 8}]]
(I tried to use ak.firsts() with axis = -1 but it seems to be not implemented yet.)
Now every most inner entry in new belongs to exactly one element from a. By replacing the current y of new with the previously extracted y I end up with my desired result:
new['y'] = y
#new= [[None, None], [{'x': 3, 'y': [3, 5]}], [], [None, {'x': 5, 'y': [8]}, {'x': 6, 'y': [8, 9]}]]
Anyway, should you know a better solution, I'd be pleased to hear it.
Using python I have to get all the permutations of given subset using python.
I used itertools.permutation but result is a bit different.
Think of a machine and it has a maximum capacity, and we have products can be produced together, and we have to fill the capacity of machine.
Output format is not important, I used a dictionary to describe it. I will make a calculation after getting this combinations.
For example :
products = {'x','y','z','a'}
machine_capcacity = 8
#required output as follows:
{'x':5,'y':1,'z':1,'a':1}
{'x':4,'y':2,'z':1,'a':1}
{'x':4,'y':1,'z':2,'a':1}
{'x':4,'y':1,'z':1,'a':2}
{'x':3,'y':3,'z':1,'a':1}
{'x':3,'y':1,'z':3,'a':1}
{'x':3,'y':1,'z':1,'a':3}
{'x':3,'y':2,'z':2,'a':1}
{'x':3,'y':2,'z':1,'a':2}
{'x':3,'y':1,'z':2,'a':2}
{'x':2,'y':4,'z':1,'a':1}
# ...
{'x':6,'y':1,'z':1} # This can't be in results,since need at least 1 element of product
{'x':4,'y':1,'z':1,'a':1} # This can't be in results,since we need to fill the capacity
And we dont want repeating elements:
{'x':5,'y':1,'z':1,'a':1}
and
{'a':1,'y':1,'z':1,'x':5}
is same thing for us.
Here is a solution not relying on itertools since it's getting contrived with all the constraints (a product yielding unique results and a minimum of 1 appearance per product):
products = {'x','y','z','a'}
machine_capacity=8
def genCap(capacity = machine_capacity,used = 0):
if used == len(products)-1: yield capacity,None
else:
for i in range(1,2+capacity-len(products)+used):
yield i,genCap(capacity-i,used+1)
def printCaps(caps,current = []):
if caps is None:
print(dict(zip(products,current)))
return
for i in caps:
printCaps(i[1],current+[i[0]])
printCaps(genCap())
might be optimize-able with tail recursion and the like. Looks almost like groupby, but I can't see an easy way to use that.
For posterity I leave my old solution - product repeats counts, so filtering it becomes a problem of it's own:
You confused product with permutation. Here is a quick solution using itertools product, and the Counter collection to create the output you want:
from collections import Counter
from itertools import product
products = {'x','y','z','a'}
machine_capacity=8
for x in filter(lambda x: len(x) == len(products),
map(Counter,product(products,repeat=machine_capacity))):
print(dict(x))
Note both product and map are lazy, so they won't be evaluated until you need them. Counter provides the output you want, and converting to dict cleans it up. Note no order is guaranteed anywhere. The filter is used to make sure all your products appear at least once (length of counter equals that of products) - and it is also lazy, so only evaluated when you need it.
You can use a recursive function to find all possible combinations of the values in range(machine_capacity) that both sum to 8 and are unique. Then, the elements in products can be mapped to each element in the sublists of the combinations found:
products = ['x','y','z','a']
machine_capacity = 8
def combinations(d, current = []):
if len(current) == len(products):
yield current
else:
for i in range(machine_capacity):
if sum(current+[i]) <= machine_capacity:
yield from combinations(d, current+[i])
data = [dict(zip(products, i)) for i in filter(lambda x:sum(x) == 8 and len(x) == len(set(x)), combinations(machine_capacity))]
Output:
[{'a': 5, 'x': 0, 'z': 2, 'y': 1}, {'a': 4, 'x': 0, 'z': 3, 'y': 1}, {'a': 3, 'x': 0, 'z': 4, 'y': 1}, {'a': 2, 'x': 0, 'z': 5, 'y': 1}, {'a': 5, 'x': 0, 'z': 1, 'y': 2}, {'a': 1, 'x': 0, 'z': 5, 'y': 2}, {'a': 4, 'x': 0, 'z': 1, 'y': 3}, {'a': 1, 'x': 0, 'z': 4, 'y': 3}, {'a': 3, 'x': 0, 'z': 1, 'y': 4}, {'a': 1, 'x': 0, 'z': 3, 'y': 4}, {'a': 2, 'x': 0, 'z': 1, 'y': 5}, {'a': 1, 'x': 0, 'z': 2, 'y': 5}, {'a': 5, 'x': 1, 'z': 2, 'y': 0}, {'a': 4, 'x': 1, 'z': 3, 'y': 0}, {'a': 3, 'x': 1, 'z': 4, 'y': 0}, {'a': 2, 'x': 1, 'z': 5, 'y': 0}, {'a': 5, 'x': 1, 'z': 0, 'y': 2}, {'a': 0, 'x': 1, 'z': 5, 'y': 2}, {'a': 4, 'x': 1, 'z': 0, 'y': 3}, {'a': 0, 'x': 1, 'z': 4, 'y': 3}, {'a': 3, 'x': 1, 'z': 0, 'y': 4}, {'a': 0, 'x': 1, 'z': 3, 'y': 4}, {'a': 2, 'x': 1, 'z': 0, 'y': 5}, {'a': 0, 'x': 1, 'z': 2, 'y': 5}, {'a': 5, 'x': 2, 'z': 1, 'y': 0}, {'a': 1, 'x': 2, 'z': 5, 'y': 0}, {'a': 5, 'x': 2, 'z': 0, 'y': 1}, {'a': 0, 'x': 2, 'z': 5, 'y': 1}, {'a': 1, 'x': 2, 'z': 0, 'y': 5}, {'a': 0, 'x': 2, 'z': 1, 'y': 5}, {'a': 4, 'x': 3, 'z': 1, 'y': 0}, {'a': 1, 'x': 3, 'z': 4, 'y': 0}, {'a': 4, 'x': 3, 'z': 0, 'y': 1}, {'a': 0, 'x': 3, 'z': 4, 'y': 1}, {'a': 1, 'x': 3, 'z': 0, 'y': 4}, {'a': 0, 'x': 3, 'z': 1, 'y': 4}, {'a': 3, 'x': 4, 'z': 1, 'y': 0}, {'a': 1, 'x': 4, 'z': 3, 'y': 0}, {'a': 3, 'x': 4, 'z': 0, 'y': 1}, {'a': 0, 'x': 4, 'z': 3, 'y': 1}, {'a': 1, 'x': 4, 'z': 0, 'y': 3}, {'a': 0, 'x': 4, 'z': 1, 'y': 3}, {'a': 2, 'x': 5, 'z': 1, 'y': 0}, {'a': 1, 'x': 5, 'z': 2, 'y': 0}, {'a': 2, 'x': 5, 'z': 0, 'y': 1}, {'a': 0, 'x': 5, 'z': 2, 'y': 1}, {'a': 1, 'x': 5, 'z': 0, 'y': 2}, {'a': 0, 'x': 5, 'z': 1, 'y': 2}]
I have a list of dictionary items
[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
I want to have an array of "array of dictionaries" with all the maximum permutation order of the list for example for the above array it would be (3 factorial ways)
[[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 0, 'y': 0}, {'x': 2, 'y': 2}, {'x': 1, 'y': 0}],
[{'x': 1, 'y': 0}, {'x': 0, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 1, 'y': 0}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 0, 'y': 0}, {'x': 1, 'y': 0}]]
itertools can do permutations
#!python2
import itertools
yourlist = [{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
for seq in itertools.permutations(yourlist):
print seq
'''
({'y': 0, 'x': 0}, {'y': 0, 'x': 1}, {'y': 2, 'x': 2})
({'y': 0, 'x': 0}, {'y': 2, 'x': 2}, {'y': 0, 'x': 1})
({'y': 0, 'x': 1}, {'y': 0, 'x': 0}, {'y': 2, 'x': 2})
({'y': 0, 'x': 1}, {'y': 2, 'x': 2}, {'y': 0, 'x': 0})
({'y': 2, 'x': 2}, {'y': 0, 'x': 0}, {'y': 0, 'x': 1})
({'y': 2, 'x': 2}, {'y': 0, 'x': 1}, {'y': 0, 'x': 0})
'''
Despite the comments, if you are still messed with how to solve your issue, consider the following.
Strategy: Make use of permutations from itertoolswhich returns a list of tuples in this case. Then, iterating through to convert list of tuples to list of lists to match with your required output.
Here is how you could do:
>>> import itertools
>>> lst = [{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
>>> [list(elem) for elem in list(itertools.permutations(lst))]
[[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 0, 'y': 0}, {'x': 2, 'y': 2}, {'x': 1, 'y': 0}],
[{'x': 1, 'y': 0}, {'x': 0, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 0, 'y': 0}, {'x': 1, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 1, 'y': 0}, {'x': 0, 'y': 0}]]
I have list of dictionaries. These dictionaries basically have just one key-value each.
For example:
lst = [{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}, {'x': 45},
{'y': 7546}, {'a': 4564}, {'x': 54568}, {'y': 4515}, {'z': 78457},
{'b': 5467}, {'a': 784}]
I am trying to divide the list of dictionaries lst into sublists after every occurrence of a dictionary with a specific key "a".
I tried using other ways that I saw on the internet but as I am new to python, I am not able to understand them and get the desired result. I want the final result to look like:
final_lst = [
[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]],
]
You can use a generator that collects elements and yields when the condition is met:
def split_by_key(lst, key):
collected = []
for d in lst:
collected.append(d)
if key in d:
yield collected
collected = []
if collected: # yield any remainder
yield collected
final_lst = list(split_by_key(lst, 'a'))
Demo:
>>> lst = [{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}, {'x': 45},
... {'y': 7546}, {'a': 4564}, {'x': 54568}, {'y': 4515}, {'z': 78457},
... {'b': 5467}, {'a': 784}]
>>> list(split_by_key(lst, 'a'))
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}], [{'x': 45}, {'y': 7546}, {'a': 4564}], [{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
>>> pprint(_)
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
Here is a straightforward solution:
result = []
for item in lst:
if not result or 'a' in result[-1][-1]:
result.append([])
result[-1].append(item)
Let's try itertools.groupby.
import itertools
lst2 = []
for i, (_, g) in enumerate(itertools.groupby(lst, key=lambda x: not x.keys() - {'a'})):
if not i % 2:
lst2.append([])
lst2[-1].extend(list(g))
lst2
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
You can zip together pairs of delimiting indexes of each partition from a conditional comprehension. Then you comprehend the appropriate slices:
splits = [i for i, d in enumerate(lst, 1) if 'a' in d]
final_lst = [lst[start: end] for start, end in zip([0] + splits, splits)]
# final_lst
# [[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}], [{'x': 45}, {'y': 7546}, {'a': 4564}], [{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
Docs on enumerate, zip.
Just to add to bunch, this would be solution based on x instead of a:
lst = [{'x':23}, {'y':23432}, {'z':78451}, {'a':564}, {'x':45}, {'y':7546},
{'a':4564}, {'x':54568}, {'y':4515}, {'z':78457}, {'b':5467}, {'a':784}]
result = []
temp = []
breaker = 'x'
for i, item in enumerate(lst):
if item.keys() != [breaker]:
temp.append(item)
else:
if i == 0:
temp.append(item)
else:
result.append(temp)
temp = [item]
if i == len(lst)-1:
result.append(temp)
I have a list containing dictionaries:
[{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
I wish to count the incidents of each dict and create a new field count
The desired outcome looks like this:
[{'x': u'osgb32', 'y': u'osgb4000', 'count': 3},
{'x': u'osgb4340', 'y': u'osgb4000', 'count': 1},
{'x': u'osgb4020', 'y': u'osgb4000', 'count': 1}]
I am unsure how to match dicts.
This is a job for collections.Counter. But first you have to convert your dicts to actual tuples, as dicts are not hashable and thus can not be used as keys in a Counter object:
>>> dicts = [{'x': u'osgb32', 'y': u'osgb4000'},
... {'x': u'osgb4340', 'y': u'osgb4000'},
... {'x': u'osgb4020', 'y': u'osgb4000'},
... {'x': u'osgb32', 'y': u'osgb4000'},
... {'x': u'osgb32', 'y': u'osgb4000'}]
>>> collections.Counter(tuple(d.items()) for d in dicts)
Counter({(('y', u'osgb4000'), ('x', u'osgb32')): 3,
(('y', u'osgb4000'), ('x', u'osgb4020')): 1,
(('y', u'osgb4000'), ('x', u'osgb4340')): 1})
Then, you can turn those back into dicts with the added "count" key:
>>> c = collections.Counter(tuple(d.items()) for d in dicts)
>>> [dict(list(k) + [("count", c[k])]) for k in c]
[{'count': 1, 'x': u'osgb4020', 'y': u'osgb4000'},
{'count': 3, 'x': u'osgb32', 'y': u'osgb4000'},
{'count': 1, 'x': u'osgb4340', 'y': u'osgb4000'}]
You can use Counter and frozenset for this:
from collections import Counter
l = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
c = Counter(frozenset(d.items()) for d in l)
[dict(k, count=v) for k, v in c.items()] # [{'y': u'osgb4000', 'x': u'osgb4340', 'count': 1}, {'y': u'osgb4000', 'x': u'osgb32', 'count': 3}, {'y': u'osgb4000', 'x': u'osgb4020', 'count': 1}]
You can achieve that easily with code below
items = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
result = {}
counted_items = []
for item in items:
key = item['x'] + '_' + item['y']
result[key] = result.get(key, 0) + 1
for key, value in result.iteritems():
y, x = key.split('_')
counted_items.append({'x': x, 'y': y, 'count': value})
print counted_items # [{'y': u'osgb32', 'x': u'osgb4000', 'count': 3}, {'y': u'osgb4340', 'x': u'osgb4000', 'count': 1}, {'y': u'osgb4020', 'x': u'osgb4000', 'count': 1}]
Another option is to use counter. There are plenty of answers of how to dial with collections.Counter :)
Good Luck!
You can pass your list of dicts as the data arg to DataFrame ctor:
In [74]:
import pandas as pd
data = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
df = pd.DataFrame(data)
df
Out[74]:
x y
0 osgb32 osgb4000
1 osgb4340 osgb4000
2 osgb4020 osgb4000
3 osgb32 osgb4000
4 osgb32 osgb4000
you can then groubpy on the cols and call size to get a count:
In [76]:
df.groupby(['x','y']).size()
Out[76]:
x y
osgb32 osgb4000 3
osgb4020 osgb4000 1
osgb4340 osgb4000 1
dtype: int64
and then call to_dict:
In [77]:
df.groupby(['x','y']).size().to_dict()
Out[77]:
{('osgb32', 'osgb4000'): 3,
('osgb4020', 'osgb4000'): 1,
('osgb4340', 'osgb4000'): 1}
You can wrap the above into a list:
In [79]:
[df.groupby(['x','y']).size().to_dict()]
Out[79]:
[{('osgb32', 'osgb4000'): 3,
('osgb4020', 'osgb4000'): 1,
('osgb4340', 'osgb4000'): 1}]
You can reset_index, rename the column and pass arg orient='records':
In [94]:
df.groupby(['x','y']).size().reset_index().rename(columns={0:'count'}).to_dict(orient='records')
Out[94]:
[{'count': 3, 'x': 'osgb32', 'y': 'osgb4000'},
{'count': 1, 'x': 'osgb4020', 'y': 'osgb4000'},
{'count': 1, 'x': 'osgb4340', 'y': 'osgb4000'}]