Counting matching dictionaries - python

I have a list containing dictionaries:
[{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
I wish to count the incidents of each dict and create a new field count
The desired outcome looks like this:
[{'x': u'osgb32', 'y': u'osgb4000', 'count': 3},
{'x': u'osgb4340', 'y': u'osgb4000', 'count': 1},
{'x': u'osgb4020', 'y': u'osgb4000', 'count': 1}]
I am unsure how to match dicts.

This is a job for collections.Counter. But first you have to convert your dicts to actual tuples, as dicts are not hashable and thus can not be used as keys in a Counter object:
>>> dicts = [{'x': u'osgb32', 'y': u'osgb4000'},
... {'x': u'osgb4340', 'y': u'osgb4000'},
... {'x': u'osgb4020', 'y': u'osgb4000'},
... {'x': u'osgb32', 'y': u'osgb4000'},
... {'x': u'osgb32', 'y': u'osgb4000'}]
>>> collections.Counter(tuple(d.items()) for d in dicts)
Counter({(('y', u'osgb4000'), ('x', u'osgb32')): 3,
(('y', u'osgb4000'), ('x', u'osgb4020')): 1,
(('y', u'osgb4000'), ('x', u'osgb4340')): 1})
Then, you can turn those back into dicts with the added "count" key:
>>> c = collections.Counter(tuple(d.items()) for d in dicts)
>>> [dict(list(k) + [("count", c[k])]) for k in c]
[{'count': 1, 'x': u'osgb4020', 'y': u'osgb4000'},
{'count': 3, 'x': u'osgb32', 'y': u'osgb4000'},
{'count': 1, 'x': u'osgb4340', 'y': u'osgb4000'}]

You can use Counter and frozenset for this:
from collections import Counter
l = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
c = Counter(frozenset(d.items()) for d in l)
[dict(k, count=v) for k, v in c.items()] # [{'y': u'osgb4000', 'x': u'osgb4340', 'count': 1}, {'y': u'osgb4000', 'x': u'osgb32', 'count': 3}, {'y': u'osgb4000', 'x': u'osgb4020', 'count': 1}]

You can achieve that easily with code below
items = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
result = {}
counted_items = []
for item in items:
key = item['x'] + '_' + item['y']
result[key] = result.get(key, 0) + 1
for key, value in result.iteritems():
y, x = key.split('_')
counted_items.append({'x': x, 'y': y, 'count': value})
print counted_items # [{'y': u'osgb32', 'x': u'osgb4000', 'count': 3}, {'y': u'osgb4340', 'x': u'osgb4000', 'count': 1}, {'y': u'osgb4020', 'x': u'osgb4000', 'count': 1}]
Another option is to use counter. There are plenty of answers of how to dial with collections.Counter :)
Good Luck!

You can pass your list of dicts as the data arg to DataFrame ctor:
In [74]:
import pandas as pd
data = [{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb4340', 'y': u'osgb4000'},
{'x': u'osgb4020', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'},
{'x': u'osgb32', 'y': u'osgb4000'}]
df = pd.DataFrame(data)
df
Out[74]:
x y
0 osgb32 osgb4000
1 osgb4340 osgb4000
2 osgb4020 osgb4000
3 osgb32 osgb4000
4 osgb32 osgb4000
you can then groubpy on the cols and call size to get a count:
In [76]:
df.groupby(['x','y']).size()
Out[76]:
x y
osgb32 osgb4000 3
osgb4020 osgb4000 1
osgb4340 osgb4000 1
dtype: int64
and then call to_dict:
In [77]:
df.groupby(['x','y']).size().to_dict()
Out[77]:
{('osgb32', 'osgb4000'): 3,
('osgb4020', 'osgb4000'): 1,
('osgb4340', 'osgb4000'): 1}
You can wrap the above into a list:
In [79]:
[df.groupby(['x','y']).size().to_dict()]
Out[79]:
[{('osgb32', 'osgb4000'): 3,
('osgb4020', 'osgb4000'): 1,
('osgb4340', 'osgb4000'): 1}]
You can reset_index, rename the column and pass arg orient='records':
In [94]:
df.groupby(['x','y']).size().reset_index().rename(columns={0:'count'}).to_dict(orient='records')
Out[94]:
[{'count': 3, 'x': 'osgb32', 'y': 'osgb4000'},
{'count': 1, 'x': 'osgb4020', 'y': 'osgb4000'},
{'count': 1, 'x': 'osgb4340', 'y': 'osgb4000'}]

Related

How to transform this dict output?

I have a little bit of a logic game. I have list with following dicts in each listnode.
{1: [1,{'X': -0.48595, 'Y': 0.0, 'Z': 0.56283},
2,{'X': -0.48595, 'Y': 0.0, 'Z': -0.6}],
2: [2,{'X': -0.48595, 'Y': 0.0, 'Z': -0.6},
4,{'X': 1.14756, 'Y': 0.0, 'Z': -0.6}],
3: [4,{'X': 1.14756, 'Y': 0.0, 'Z': -0.6},
9,{'X': 1.14756, 'Y': 0.0, 'Z': 0.8}]}
What I want? List with nodes of lists as below:
[{'Id': 1, 'Nodes': [{'Id': 1, 'Position': {'X': -0.48595, 'Y': 0.0, 'Z': 0.56283}},
{'Id': 2, 'Position': {'X': -0.48595, 'Y': 0.0, 'Z': -0.6}}]},
{'Id': 2, 'Nodes': [{'Id': 2, 'Position': {'X': -0.48595, 'Y': 0.0, 'Z': -0.6}},
{'Id': 4, 'Position': {'X': 1.14756, 'Y': 0.0, 'Z': -0.6}}]},
{'Id': 3, 'Nodes': [{'Id': 4, 'Position': {'X': 1.14756, 'Y': 0.0, 'Z': -0.6}},
{'Id': 9, 'Position': {'X': 1.14756, 'Y': 0.0, 'Z': 0.8}}]}]
How can I make that transformation? I'm keep trying modifying my code, but everytime something's wrong.
mem_list = []
for x in range(len(list_temp)):
mem_loc_list = []
for key in list_temp[x]:
nodes_list = []
member = {}
member['Id'] = key
node_dict = {}
for y in list_temp[x][key]:
if type(y) == int:
node_dict['Id'] = y
else:
node_dict['Position'] = y
nodes_list.append(node_dict)
member['Nodes'] = nodes_list
mem_loc_list.append(member)
Using list comprehension you can do it like this:
[[{"Id": k, "Nodes": [{"Id": v[i], "Position": v[i+1]} for i in range(0, len(v), 2)]}
for k,v in data.items()] for data in list_temp]

Remove dictionaries from list where there is more than one with the same hour and minute?

I have a list of dictionaries which have a date string within them. I would like to remove a single entry of two if there is a matching hour and minute for that record.
Here is some sample data, as you can see the first two dictionaries have 14:21 in them, I would only like one of those dictionaries and the other to be removed.
I'm not sure how to even start with this one, is it possible?
[{'x': '2018-06-19 14:21:22', 'y': 80},
{'x': '2018-06-19 14:21:26', 'y': 86},
{'x': '2018-06-19 14:24:02', 'y': 89},
{'x': '2018-06-19 14:24:07', 'y': 95},
{'x': '2018-06-19 14:25:10', 'y': 127}]
This is one approach using a simple iteration and a check list.
Demo:
checkVal = set()
data = [{'x': '2018-06-19 14:21:22', 'y': 80}, {'x': '2018-06-19 14:21:26', 'y': 86}, {'x': '2018-06-19 14:24:02', 'y': 89}, {'x': '2018-06-19 14:24:07', 'y': 95}, {'x': '2018-06-19 14:25:10', 'y': 127}, {'x': '2018-06-19 14:25:14', 'y': 138}, {'x': '2018-06-19 14:28:04', 'y': 91}, {'x': '2018-06-19 14:28:08', 'y': 83}, {'x': '2018-06-19 14:30:11', 'y': 92}, {'x': '2018-06-19 14:30:16', 'y': 99}, {'x': '2018-06-19 14:31:21', 'y': 80}, {'x': '2018-06-19 14:31:26', 'y': 90}, {'x': '2018-06-19 14:34:03', 'y': 131}, {'x': '2018-06-19 14:34:07', 'y': 137}, {'x': '2018-06-19 14:35:28', 'y': 98}, {'x': '2018-06-19 14:35:32', 'y': 91}, {'x': '2018-06-19 14:37:11', 'y': 86}, {'x': '2018-06-19 14:37:16', 'y': 92}, {'x': '2018-06-19 14:39:02', 'y': 111}, {'x': '2018-06-19 14:39:06', 'y': 118}, {'x': '2018-06-19 14:42:03', 'y': 95}, {'x': '2018-06-19 14:42:08', 'y': 104}, {'x': '2018-06-19 14:43:04', 'y': 165}, {'x': '2018-06-19 14:43:09', 'y': 168}, {'x': '2018-06-19 14:45:11', 'y': 89}, {'x': '2018-06-19 14:45:15', 'y': 94}, {'x': '2018-06-19 14:47:11', 'y': 133}, {'x': '2018-06-19 14:47:16', 'y': 146}, {'x': '2018-06-19 14:49:16', 'y': 134}, {'x': '2018-06-19 14:49:21', 'y': 146}, {'x': '2018-06-19 14:52:05', 'y': 157}, {'x': '2018-06-19 14:52:09', 'y': 169}, {'x': '2018-06-19 14:54:13', 'y': 66}, {'x': '2018-06-19 14:54:17', 'y': 63}, {'x': '2018-06-19 14:55:09', 'y': 95}, {'x': '2018-06-19 14:55:14', 'y': 90}, {'x': '2018-06-19 14:58:02', 'y': 112}, {'x': '2018-06-19 14:58:07', 'y': 119}, {'x': '2018-06-19 14:59:09', 'y': 98}, {'x': '2018-06-19 14:59:13', 'y': 91}]
res = []
for i in data:
if i["x"][:-3] not in checkVal:
res.append(i)
checkVal.add(i["x"][:-3])
print(res)
Output:
[{'y': 80, 'x': '2018-06-19 14:21:22'}, {'y': 89, 'x': '2018-06-19 14:24:02'}, {'y': 127, 'x': '2018-06-19 14:25:10'}, {'y': 91, 'x': '2018-06-19 14:28:04'}, {'y': 92, 'x': '2018-06-19 14:30:11'}, {'y': 80, 'x': '2018-06-19 14:31:21'}, {'y': 131, 'x': '2018-06-19 14:34:03'}, {'y': 98, 'x': '2018-06-19 14:35:28'}, {'y': 86, 'x': '2018-06-19 14:37:11'}, {'y': 111, 'x': '2018-06-19 14:39:02'}, {'y': 95, 'x': '2018-06-19 14:42:03'}, {'y': 165, 'x': '2018-06-19 14:43:04'}, {'y': 89, 'x': '2018-06-19 14:45:11'}, {'y': 133, 'x': '2018-06-19 14:47:11'}, {'y': 134, 'x': '2018-06-19 14:49:16'}, {'y': 157, 'x': '2018-06-19 14:52:05'}, {'y': 66, 'x': '2018-06-19 14:54:13'}, {'y': 95, 'x': '2018-06-19 14:55:09'}, {'y': 112, 'x': '2018-06-19 14:58:02'}, {'y': 98, 'x': '2018-06-19 14:59:09'}]
You already have an answer, but for a very efficient solution use the itertools unique_everseen recipe. It's also safer since it will throw a useful error if the input date isn't valid.
from datetime import datetime
from itertools import filterfalse
input_ = [{'x': '2018-06-19 14:21:22', 'y': 80},
{'x': '2018-06-19 14:21:26', 'y': 86},
{'x': '2018-06-19 14:24:02', 'y': 89},
{'x': '2018-06-19 14:24:07', 'y': 95},
{'x': '2018-06-19 14:25:10', 'y': 127}]
def unique_everseen(iterable, key=None):
"""List unique elements, preserving order. Remember all elements ever seen.
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
"""
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def hour_and_min(dct):
fmt = '%Y-%m-%d %H:%M:%S'
d = datetime.strptime(dct['x'], fmt)
return d.hour, d.minute # add `, d.year, d.month, d.day` if you care about these
output = list(unique_everseen(input_, key=hour_and_min))
And output is:
[{'x': '2018-06-19 14:21:22', 'y': 80},
{'x': '2018-06-19 14:24:02', 'y': 89},
{'x': '2018-06-19 14:25:10', 'y': 127}]

Permuting a order of items in a list in python

I have a list of dictionary items
[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
I want to have an array of "array of dictionaries" with all the maximum permutation order of the list for example for the above array it would be (3 factorial ways)
[[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 0, 'y': 0}, {'x': 2, 'y': 2}, {'x': 1, 'y': 0}],
[{'x': 1, 'y': 0}, {'x': 0, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 1, 'y': 0}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 0, 'y': 0}, {'x': 1, 'y': 0}]]
itertools can do permutations
#!python2
import itertools
yourlist = [{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
for seq in itertools.permutations(yourlist):
print seq
'''
({'y': 0, 'x': 0}, {'y': 0, 'x': 1}, {'y': 2, 'x': 2})
({'y': 0, 'x': 0}, {'y': 2, 'x': 2}, {'y': 0, 'x': 1})
({'y': 0, 'x': 1}, {'y': 0, 'x': 0}, {'y': 2, 'x': 2})
({'y': 0, 'x': 1}, {'y': 2, 'x': 2}, {'y': 0, 'x': 0})
({'y': 2, 'x': 2}, {'y': 0, 'x': 0}, {'y': 0, 'x': 1})
({'y': 2, 'x': 2}, {'y': 0, 'x': 1}, {'y': 0, 'x': 0})
'''
Despite the comments, if you are still messed with how to solve your issue, consider the following.
Strategy: Make use of permutations from itertoolswhich returns a list of tuples in this case. Then, iterating through to convert list of tuples to list of lists to match with your required output.
Here is how you could do:
>>> import itertools
>>> lst = [{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}]
>>> [list(elem) for elem in list(itertools.permutations(lst))]
[[{'x': 0, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 0, 'y': 0}, {'x': 2, 'y': 2}, {'x': 1, 'y': 0}],
[{'x': 1, 'y': 0}, {'x': 0, 'y': 0}, {'x': 2, 'y': 2}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 0, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 0, 'y': 0}, {'x': 1, 'y': 0}],
[{'x': 2, 'y': 2}, {'x': 1, 'y': 0}, {'x': 0, 'y': 0}]]

Splitting list of dictionary into sublists after the occurence of particular key of dictionary

I have list of dictionaries. These dictionaries basically have just one key-value each.
For example:
lst = [{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}, {'x': 45},
{'y': 7546}, {'a': 4564}, {'x': 54568}, {'y': 4515}, {'z': 78457},
{'b': 5467}, {'a': 784}]
I am trying to divide the list of dictionaries lst into sublists after every occurrence of a dictionary with a specific key "a".
I tried using other ways that I saw on the internet but as I am new to python, I am not able to understand them and get the desired result. I want the final result to look like:
final_lst = [
[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]],
]
You can use a generator that collects elements and yields when the condition is met:
def split_by_key(lst, key):
collected = []
for d in lst:
collected.append(d)
if key in d:
yield collected
collected = []
if collected: # yield any remainder
yield collected
final_lst = list(split_by_key(lst, 'a'))
Demo:
>>> lst = [{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}, {'x': 45},
... {'y': 7546}, {'a': 4564}, {'x': 54568}, {'y': 4515}, {'z': 78457},
... {'b': 5467}, {'a': 784}]
>>> list(split_by_key(lst, 'a'))
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}], [{'x': 45}, {'y': 7546}, {'a': 4564}], [{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
>>> pprint(_)
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
Here is a straightforward solution:
result = []
for item in lst:
if not result or 'a' in result[-1][-1]:
result.append([])
result[-1].append(item)
Let's try itertools.groupby.
import itertools
lst2 = []
for i, (_, g) in enumerate(itertools.groupby(lst, key=lambda x: not x.keys() - {'a'})):
if not i % 2:
lst2.append([])
lst2[-1].extend(list(g))
lst2
[[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}],
[{'x': 45}, {'y': 7546}, {'a': 4564}],
[{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
You can zip together pairs of delimiting indexes of each partition from a conditional comprehension. Then you comprehend the appropriate slices:
splits = [i for i, d in enumerate(lst, 1) if 'a' in d]
final_lst = [lst[start: end] for start, end in zip([0] + splits, splits)]
# final_lst
# [[{'x': 23}, {'y': 23432}, {'z': 78451}, {'a': 564}], [{'x': 45}, {'y': 7546}, {'a': 4564}], [{'x': 54568}, {'y': 4515}, {'z': 78457}, {'b': 5467}, {'a': 784}]]
Docs on enumerate, zip.
Just to add to bunch, this would be solution based on x instead of a:
lst = [{'x':23}, {'y':23432}, {'z':78451}, {'a':564}, {'x':45}, {'y':7546},
{'a':4564}, {'x':54568}, {'y':4515}, {'z':78457}, {'b':5467}, {'a':784}]
result = []
temp = []
breaker = 'x'
for i, item in enumerate(lst):
if item.keys() != [breaker]:
temp.append(item)
else:
if i == 0:
temp.append(item)
else:
result.append(temp)
temp = [item]
if i == len(lst)-1:
result.append(temp)

Select highest value from python list of dicts

In a list of list of dicts:
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
I need to retrieve the highest 'y' values from each of the list of dicts...so the resulting list would contain:
Z = [(4, 7), (3,13), (1,20)]
In A, the 'x' is the key of each dict while 'y' is the value of each dict.
Any ideas? Thank you.
max accept optional key parameter.
A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Z = []
for a in A:
d = max(a, key=lambda d: d['y'])
Z.append((d['x'], d['y']))
print Z
UPDATE
suggested by – J.F. Sebastian:
from operator import itemgetter
Z = [itemgetter(*'xy')(max(lst, key=itemgetter('y'))) for lst in A]
I'd use itemgetter and max's key argument:
from operator import itemgetter
pair_getter = itemgetter('x', 'y')
[pair_getter(max(d, key=itemgetter('y'))) for d in A]
[max(((d['x'], d['y']) for d in l), key=lambda t: t[1]) for l in A]
The solution to your stated problem has been given, but I suggest changing your underlying data structure. Tuples are much faster for small elements such as a point. You may retain the clarity of a dictionary by using namedtuple if you so desire.
>>> from collections import namedtuple
>>> A = [
[{'x': 1, 'y': 0}, {'x': 2, 'y': 3}, {'x': 3, 'y': 4}, {'x': 4, 'y': 7}],
[{'x': 1, 'y': 0}, {'x': 2, 'y': 2}, {'x': 3, 'y': 13}, {'x': 4, 'y': 0}],
[{'x': 1, 'y': 20}, {'x': 2, 'y': 4}, {'x': 3, 'y': 0}, {'x': 4, 'y': 8}]
]
Making a Point namedtuple is simple
>>> Point = namedtuple('Point', 'x y')
This is what an instance looks like
>>> Point(x=1, y=0) # Point(1, 0) also works
Point(x=1, y=0)
A would then look like this
>>> A = [[Point(**y) for y in x] for x in A]
>>> A
[[Point(x=1, y=0), Point(x=2, y=3), Point(x=3, y=4), Point(x=4, y=7)],
[Point(x=1, y=0), Point(x=2, y=2), Point(x=3, y=13), Point(x=4, y=0)],
[Point(x=1, y=20), Point(x=2, y=4), Point(x=3, y=0), Point(x=4, y=8)]]
Now working like this is much easier:
>>> from operator import attrgetter
>>> [max(row, key=attrgetter('y')) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
To retain the speed advantages of tuples it's better to access by index:
>>> from operator import itemgetter
>>> [max(row, key=itemgetter(2)) for row in A]
[Point(x=4, y=7), Point(x=3, y=13), Point(x=1, y=20)]
result=[]
for item in a:
new = sorted(item, key=lambda k: k['y'],reverse=True)
result.append((new[0]['x'],new[0]['y']))
print(result)
Note-The is not the efficient way to do this but this is one of the ways to get the required result.

Categories

Resources