I am tying to replace a small sql database with a dictionary. The only one problem I am facing is with query. Its just becoming so complicated. here is the example:
foo={'id_1': {'location': 'location_1', 'material': 'A'},
'id_2': {'location': 'location_1', 'material': 'A'},
'id_3': {'location': 'location_1', 'material': 'B'},
'id_4': {'location': 'location_2', 'material': 'B'},
'id_5': {'location': 'location_2', 'material': 'A'},
'id_6': {'location': 'location_1', 'material': 'C'},
'id_7': {'location': 'location_1', 'material': 'A'},
'id_8': {'location': 'location_2', 'material': 'B'}}
So, I wanted to to some query based on locations and the result should look like this:
{'location_1' : {'A': 3, 'B': 1, 'C': 1}, 'location_2': {'A':1,'B':2}}
Is there any way to do query on python dictionary? Or at least neat way of doing it ?
Thanks
You'd need to use a defaultdict() and Counter() object to achieve what you want:
results = defaultdict(Counter)
for entry in foo.values():
results[entry['location']][entry['material']] += 1
which produces:
defaultdict(<class 'collections.Counter'>, {
'location_2': Counter({'B': 2, 'A': 1}),
'location_1': Counter({'A': 3, 'C': 1, 'B': 1})
})
but using an actual database (such as the bundled sqlite3) would be far more efficient.
How about this:
d = {}
for k,v in foo.iteritems():
loc = v['location']
mat = v['material']
d.setdefault(loc, {})
d[loc].setdefault(mat, 0)
d[loc][mat] = d[loc].get(mat, 0) + 1
print d
Output:
{'location_2': {'A': 1, 'B': 2}, 'location_1': {'A': 3, 'C': 1, 'B': 1}}
Related
I'm trying to code a faster way to solve the following problem but I don't know how to do it:
I have the following list of dicts and list of identifiers:
list_of_dicts = [{'id': 1, 'name': 'A'}, {'id': 2, 'name': 'B'}, {'id': 3, 'name': 'C'}, {'id': 4, 'name': 'D'}]
list_of_ids = [1, 3, 2, 4, 1, 3, 4]
I'd like to have the following output:
[{'id': 1, 'name': 'A'}, {'id': 3, 'name': 'C'}, {'id': 2, 'name': 'B'}, {'id': 4, 'name': 'D'}, {'id': 1, 'name': 'A'}, {'id': 3, 'name': 'C'}, {'id': 4, 'name': 'D'}]
The way I'm doing it is:
list_of_dict_ids = [d['id'] for d in list_of_dicts]
ordered_list_by_ids = [list_of_dicts[list_of_dict_ids.index(i)] for i in list_of_ids]
Is there any faster way to do it?
You can do like this :
dic = {d["id"]: d for d in list_of_dicts}
dic
>>>{1: {'id': 1}, 2: {'id': 2}, 3: {'id': 3}, 4: {'id': 4}}
lst =[dic[i] for i in list_of_ids]
lst
>>>[{'id': 1}, {'id': 3}, {'id': 2}, {'id': 4}, {'id': 1}, {'id': 3}, {'id': 4}]
Consider the following list:
l=[[{'a': 'a'}, {'a': 'b'}, {'a': 1}], [{'a': 'a'}, {'a': 'b'}, {'a': 1}], [{'a': 'a'}, {'a': 'c'}, {'a': 1}]]
I would like to find the number of distinct elements on a same position in this list.
Example
if position=1, output would be 2 ( {'a': 'b'} and {'a': 'c'}).
if position=0, output would be 1: ( {'a': 'a'} and {'a': 'c'}).
Is there a way to do this using map/lambda ? I dont want to do a loop for this.
Thank you.
This would work:
len(set(map(lambda x: tuple(x[position].items()), l)))
Although I'd recommend never use such code IRL.
I have a list with multiple dicts in, I need to check which dicts are repeated and create a new list with only a single occurrence of each but with the amount of repeated elements in the first list.
For example:
I have that list:
[{'a': 123, 'b': 1234, 'c': 'john', 'amount': 1},
{'a': 456, 'b': 1234, 'c': 'doe','amount': 1},
{'a': 456, 'b': 1234, 'c': 'steve','amount': 1},
{'a': 123, 'b': 1234, 'c': 'john','amount': 1},
{'a': 123, 'b': 1234, 'c': 'john','amount': 1}]
I need to output:
[{'a': 123, 'b': 1234, 'c': 'john', 'amount': 3},
{'a': 456, 'b': 1234, 'c': 'steve','amount': 1},
{'a': 456, 'b': 1234, 'c': 'doe','amount': 1}]
I've tried some things I found by Googling but nothing works completely, the last that I've tried let me know where the repeated ones where, but I'm stuck in what to do next.
def index(lst, element):
result = []
offset = -1
while True:
try:
offset = lst.index(element, offset+1)
except ValueError:
return result
result.append(offset)
for i in l:
if len(index(l,i)) > 1:
i['amount'] += 1
print l
But it returns
[{'a': 123, 'c': 'john', 'b': 1234, 'amount': 2},
{'a': 456, 'c': 'doe', 'b': 1234, 'amount': 1},
{'a': 456, 'c': 'steve', 'b': 1234, 'amount': 1},
{'a': 123, 'c': 'john', 'b': 1234, 'amount': 2},
{'a': 123, 'c': 'john', 'b': 1234, 'amount': 1}]
Here is an option using pandas by which we can concatenate the dictionary into a data frame, and then we can groupby column a, b and c and calculate the sum of amount. And if we want a dictionary back, pandas data frame has a built in to_dict() function. Specifying the parameter as index, we can get a dictionary as the desired output:
import pandas as pd
list(pd.DataFrame(mylist).groupby(['a', 'b', 'c']).sum().reset_index().to_dict('index').values())
# [{'a': 123, 'amount': 3, 'b': 1234, 'c': 'john'},
# {'a': 456, 'amount': 1, 'b': 1234, 'c': 'doe'},
# {'a': 456, 'amount': 1, 'b': 1234, 'c': 'steve'}]
I have a data structure that looks like this
arrayObjects = [{id: 1, array1: [a,b,c]}, {id: 2, array1: [d,e,f]}]
and would like to transform it into this:
newArrayObjects = [{id: 1, term: a}, {id:1, term: b}, ... {id:2, term: f} ]
any idea on how to do this?
this is my minimum version right now:
for item in arrayObjects:
for term in item['array1']:
print(term, item['id'])
to clarify: I know how to do this with a nested loop, I'm just going for the most pythonic version possible haha
You can use list comprehension:
>>> a = [{'id': 1, 'array': ['a','b','c']}, {'id': 2, 'array': ['d','e','f']}]
>>> [{'id': d['id'], 'term': v } for d in a for v in d['array']]
[{'term': 'a', 'id': 1}, {'term': 'b', 'id': 1}, {'term': 'c', 'id': 1}, {'term': 'd', 'id': 2}, {'term': 'e', 'id': 2}, {'term': 'f', 'id': 2}]
I'm new with Python and I have this structure achieved from a DB
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
I know that I can get the id of a record with:
data[i]['id']
but I need to collect by ID in a proper data structure, in order to get the average values for every ID.
What is the better choice for this?
I'm thinking build a new dict for every ID set, but the IDs can grow in number, and I don't figure out how tackle this problem. If someone can give me some idea I would be very grateful.
Assuming your data is sorted by ID as it appears in your data variable, you can try using itertools.groupby, which can be instructed to group by id. You can then create a new dictionary that has keys equal to the id numbers and values equal to the means:
In [1]: from itertools import groupby
In [2]: data=[
...: {'Value': '0.2', 'id': 1},
...: {'Value': '1.2', 'id': 1},
...: {'Value': '33.34', 'id': 2},
...: {'Value': '44.3', 'id': 3},
...: {'Value': '33.23', 'id': 3},
...: {'Value': '21.1', 'id': 4},
...: {'Value': '5.33', 'id': 4},
...: {'Value': '33.3', 'id': 5},
...: {'Value': '12.2', 'id': 5},
...: {'Value': '1.22', 'id': 5},
...: {'Value': '1.23', 'id': 6}
...: ]
In [3]: means = {}
In [4]: for k, g in groupby(data, key=lambda x: x['id']):
...: g = list(g)
...: means[k] = sum(float(x['Value']) for x in g) / len(g)
...:
...:
In [5]: means
Out[5]:
{1: 0.69999999999999996,
2: 33.340000000000003,
3: 38.765000000000001,
4: 13.215,
5: 15.573333333333332,
6: 1.23}
(Updated: after DSM's comment.)
You could reshape the data like this:
from collections import defaultdict
data=[
{'Value': '0.2', 'id': 1},
{'Value': '1.2', 'id': 1},
{'Value': '33.34', 'id': 2},
{'Value': '44.3', 'id': 3},
{'Value': '33.23', 'id': 3},
{'Value': '21.1', 'id': 4},
{'Value': '5.33', 'id': 4},
{'Value': '33.3', 'id': 5},
{'Value': '12.2', 'id': 5},
{'Value': '1.22', 'id': 5},
{'Value': '1.23', 'id': 6}
]
newdata = defaultdict(list)
for r in data:
newdata[r['id']].append(float(r['Value']))
This would yield:
In [2]: newdata
Out[2]: defaultdict(<type 'list'>, {1: [0.2, 1.2], 2: [33.34], 3: [44.3, 33.23], 4: [21.1, 5.33], 5: [33.3, 12.2, 1.22], 6: [1.23]})
(Update 2)
Calculating the means is now simple with a dictionary comprehension:
mean = {id: sum(values) / len(values) for id, values in newdata.viewitems()}
Which gives:
In [4]: mean
Out[4]: {1: 0.7, 2: 33.34, 3: 38.765, 4: 13.215, 5: 15.573333333333332, 6: 1.23}
If you have numpy, you could use it for this easily:
import numpy
numpy.mean([x['id'] for x in data])
Otherwise, it would be as simple as:
from __future__ import division # if python2.7
ids = [x['id'] for x in data]
print sum(ids)/len(ids)
You can simply create a list of IDs after all have been collected:
id_list = [element['id'] for element in data]
From there you can calculate whatever you want.