Python: merging 2 dictionaries (from csv file) with same key AND values

Python: merging 2 dictionaries (from csv file) with same key AND values - python

I have these two DictReader-ed csv files:
A = {"Name": "Alex", "Age": 17} {"Name": "Bob", "Age": 20"} {"Name": "Clark", "Age": 24"}
B = {"Age": 17, "Class": "Economy"} {"Age": 24, "Class": "IT"} {"Age":17, "Class": Arts}
and several more bracketed values.
Is it possible to join them to form this:
{"Name": "Alex", "Age": 17, "Class": [{"Economy"}, {"Arts"}]}
{"Name": "Clark", "Age": 24, "Class": [{"IT"}]}
In short, joining them when they have the same Age and put all the same classes into a list?
So far I've only read both dicts:
import csv
A=open('A.csv')
A_reader = csv.DictReader(A)
B=open('B.csv')
B_reader = csv.DictReader(B)
for item in A_reader:
print(item)
for item in B_reader:
print(item)
but unsure of how to merge them as mentioned.
Thank you!
EDIT: The csv given is so that no two people will have the same age.

import copy
A = [{"Name": "Alex", "Age": 17}, {"Name": "Bob", "Age": 20}, {"Name": "Clark", "Age": 24}]
B = [{"Age": 17, "Class": "Economy"}, {"Age": 24, "Class": "IT"}, {"Age":17, "Class": "Arts"}]
C = []
for a in A:
c = copy.copy(a)
c["Class"] = []
for b in B:
if a["Age"]==b["Age"]:
c["Class"].append(b["Class"])
C.append(c)
Result is:
[{'Name': 'Alex', 'Age': 17, 'Class': ['Economy', 'Arts']},
{'Name': 'Bob', 'Age': 20, 'Class': []},
{'Name': 'Clark', 'Age': 24, 'Class': ['IT']}]
If it doesn't work for you, let me know :)

I'd first turn B into a dictionary {age: [classes]}, then loop over A and combine the dictionaries – the more data you have, the more efficient it will be compared to looping over B over and over again. I'd use a collections.defaultdict for that.1
from collections import defaultdict
# {17: ['Economy', 'Arts'], 24: ['IT']}
classes_by_age = defaultdict(list)
for b in B:
classes_by_age[b['Age']].append(b['Class'])
With that in place, all you need to do is merge the dictionaries. I guess one of the most concise ways to do that is by passing a combination of the double asterisk operator ** and a keyword argument to the dict constructor:
merged = [dict(**a, Classes=classes_by_age[a['Age']]) for a in A]
1 If you don't want to import defaultdict, you can simply initialize classes_by_age as an empty dictionary and in the loop do:
for b in B:
age = b['Age']
class_ = b['Class']
if age in classes_by_age:
classes_by_age[age].append(class_)
else:
classes_by_age[age] = [class_]
But then you'd also have do adopt the final list comprehension to the one below, otherwise empty Classes would cause trouble:
[dict(**a, Classes=classes_by_age.get(a['Age'], [])) for a in A]

You mentioned Pandas in a comment. If that is an option, then you could try:
import pandas as pd
df_A = pd.read_csv("A.csv")
df_B = pd.read_csv("B.csv")
result = df_A.merge(
df_B.groupby("Age")["Class"].agg(list), on="Age", how="left"
).to_dict("records")

Related

working with list of dictionaries in python

I have a list of dictionaries look bellow
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
Expected output should be
[
{"item_name": ["orange", "apple"], "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
but how i got
[
{"item_name": "orangeapple", "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
Here my code bellow
comp_key = "id"
conc_key = "item_name"
res = []
for ele in test_list:
temp = False
for ele1 in res:
if ele1[comp_key] == ele[comp_key]:
ele1[conc_key] = ele1[conc_key] + ele[conc_key]
temp = True
break
if not temp:
res.append(ele)
how to resolve...?

Something like this - the special sauce is the isinstance stuff to make sure you're making the concatenated value a list instead.
Do note that this assumes the raw list is ordered by the comp_key (id), and will misbehave if that's not the case.
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
comp_key = "id"
conc_key = "item_name"
grouped_items = []
for item in raw_list:
last_group = grouped_items[-1] if grouped_items else None
if not last_group or last_group[comp_key] != item[comp_key]: # Starting a new group?
grouped_items.append(item.copy()) # Shallow-copy the item into the result array
else:
if not isinstance(last_group[conc_key], list):
# The concatenation key is not a list yet, make it so
last_group[conc_key] = [last_group[conc_key]]
last_group[conc_key].append(item[conc_key])
print(grouped_items)

import pandas as pd
df = pd.DataFrame(raw_list)
dd = pd.concat([df.groupby('id')['item_name'].apply(list), df.groupby('id').['total'].apply(sum)], axis=1).reset_index()
dd.to_dict('records')
you could use pandas to group by and apply function to two columns then convert to dict
[{'id': 12, 'item_name': ['orange', 'apple'], 'total': 2.0},
{'id': 34, 'item_name': ['apple'], 'total': 22.0}]

You could use itertools.grouper for grouping by id and collections.defaultdict to combine values with same keys into lists.
from itertools import groupby
from collections import defaultdict
id_getter = lambda x: x['id']
gp = groupby(sorted(raw_list, key=id_getter), key=id_getter)
out = []
for _,i in gp:
subdict = defaultdict(list)
for j in i:
for k,v in j.items():
subdict[k].append(v)
out.append(dict(subdict))
out
Working with complex datatypes such as nested lists and dictionaries, I would advice really utilizing the APIs provided by collections and itertools.

Converting CSV to Hierarchical JSON output

I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.
gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms
The expected Output format should be in the following format
{
"name": "A1BG",
"children": [
{"name": "Adenocarcinoma"},
{"name": "apnea"},
{"name": "Athritis"}
]
},
{
"name": "A2M",
"children": [
{"name": "Asthma"},
{"name": "Astrocytoma"},
{"name": "Diabetes"}
]
},
{
"name": "NAT1",
"children": [
{"name": "polyps"},
{"name": "lymphoma"},
{"name": "neoplasms"}
]
}
The python code I have written is below. let me know where I need to change to get the desired output.
import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])
for key, value in grouped:
dictionary = {}
dictList = []
anotherDict = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['name'] = j.at[0, 'gene']
for i in j.index:
anotherDict['disease'] = j.at[i, 'disease']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
with open('outputresult3.json', "w") as out:
json.dump(finalList,out)

import json
json_data = []
# group the data by each unique gene
for gene, data in df.groupby(["gene"]):
# obtain a list of diseases for the current gene
diseases = data["disease"].tolist()
# create a new list of dictionaries to satisfy json requirements
children = [{"name": disease} for disease in diseases]
entry = {"name": gene, "children": children}
json_data.append(entry)
with open('outputresult3.json', "w") as out:
json.dump(json_data, out)

Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:
L = (df.rename(columns={'disease':'name'})
.groupby('gene')
.apply(lambda x: x[['name']].to_dict('records'))
.reset_index(name='children')
.rename(columns={'gene':'name'})
.to_dict('records')
)
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
{'name': 'apnea'},
{'name': 'Athritis'}]},
{'name': 'A2M', 'children': [{'name': 'Asthma'},
{'name': 'Astrocytoma'},
{'name': 'Diabetes'}]},
{'name': 'NAT1', 'children': [{'name': 'polyps'},
{'name': 'lymphoma'},
{'name': 'neoplasms'}]}]
with open('outputresult3.json', "w") as out:
json.dump(L,out)

Search nested dictionary values in a list and return whole nested dictionary that contains searched value [duplicate]

Assume I have this:
[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
and by searching "Pam" as name, I want to retrieve the related dictionary: {name: "Pam", age: 7}
How to achieve this ?

You can use a generator expression:
>>> dicts = [
... { "name": "Tom", "age": 10 },
... { "name": "Mark", "age": 5 },
... { "name": "Pam", "age": 7 },
... { "name": "Dick", "age": 12 }
... ]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}
If you need to handle the item not being there, then you can do what user Matt suggested in his comment and provide a default using a slightly different API:
next((item for item in dicts if item["name"] == "Pam"), None)
And to find the index of the item, rather than the item itself, you can enumerate() the list:
next((i for i, item in enumerate(dicts) if item["name"] == "Pam"), None)

This looks to me the most pythonic way:
people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]
filter(lambda person: person['name'] == 'Pam', people)
result (returned as a list in Python 2):
[{'age': 7, 'name': 'Pam'}]
Note: In Python 3, a filter object is returned. So the python3 solution would be:
list(filter(lambda person: person['name'] == 'Pam', people))

#Frédéric Hamidi's answer is great. In Python 3.x the syntax for .next() changed slightly. Thus a slight modification:
>>> dicts = [
{ "name": "Tom", "age": 10 },
{ "name": "Mark", "age": 5 },
{ "name": "Pam", "age": 7 },
{ "name": "Dick", "age": 12 }
]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}
As mentioned in the comments by #Matt, you can add a default value as such:
>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

You can use a list comprehension:
def search(name, people):
return [element for element in people if element['name'] == name]

I tested various methods to go through a list of dictionaries and return the dictionaries where key x has a certain value.
Results:
Speed: list comprehension > generator expression >> normal list iteration >>> filter.
All scale linear with the number of dicts in the list (10x list size -> 10x time).
The keys per dictionary does not affect speed significantly for large amounts (thousands) of keys. Please see this graph I calculated: https://imgur.com/a/quQzv (method names see below).
All tests done with Python 3.6.4, W7x64.
from random import randint
from timeit import timeit
list_dicts = []
for _ in range(1000): # number of dicts in the list
dict_tmp = {}
for i in range(10): # number of keys for each dict
dict_tmp[f"key{i}"] = randint(0,50)
list_dicts.append( dict_tmp )
def a():
# normal iteration over all elements
for dict_ in list_dicts:
if dict_["key3"] == 20:
pass
def b():
# use 'generator'
for dict_ in (x for x in list_dicts if x["key3"] == 20):
pass
def c():
# use 'list'
for dict_ in [x for x in list_dicts if x["key3"] == 20]:
pass
def d():
# use 'filter'
for dict_ in filter(lambda x: x['key3'] == 20, list_dicts):
pass
Results:
1.7303 # normal list iteration
1.3849 # generator expression
1.3158 # list comprehension
7.7848 # filter

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]
def search(name):
for p in people:
if p['name'] == name:
return p
search("Pam")

Have you ever tried out the pandas package? It's perfect for this kind of search task and optimized too.
import pandas as pd
listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)
# The pandas dataframe allows you to pick out specific values like so:
df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]
# Alternate syntax, same thing
df2 = df[ (df.name == 'Pam') & (df.age == 7) ]
I've added a little bit of benchmarking below to illustrate pandas' faster runtimes on a larger scale i.e. 100k+ entries:
setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'
setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'
method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))
#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

To add just a tiny bit to #FrédéricHamidi.
In case you are not sure a key is in the the list of dicts, something like this would help:
next((item for item in dicts if item.get("name") and item["name"] == "Pam"), None)

Simply using list comprehension:
[i for i in dct if i['name'] == 'Pam'][0]
Sample code:
dct = [
{'name': 'Tom', 'age': 10},
{'name': 'Mark', 'age': 5},
{'name': 'Pam', 'age': 7}
]
print([i for i in dct if i['name'] == 'Pam'][0])
> {'age': 7, 'name': 'Pam'}

You can achieve this with the usage of filter and next methods in Python.
filter method filters the given sequence and returns an iterator.
next method accepts an iterator and returns the next element in the list.
So you can find the element by,
my_dict = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
next(filter(lambda obj: obj.get('name') == 'Pam', my_dict), None)
and the output is,
{'name': 'Pam', 'age': 7}
Note: The above code will return None incase if the name we are searching is not found.

One simple way using list comprehensions is , if l is the list
l = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
then
[d['age'] for d in l if d['name']=='Tom']

def dsearch(lod, **kw):
return filter(lambda i: all((i[k] == v for (k, v) in kw.items())), lod)
lod=[{'a':33, 'b':'test2', 'c':'a.ing333'},
{'a':22, 'b':'ihaha', 'c':'fbgval'},
{'a':33, 'b':'TEst1', 'c':'s.ing123'},
{'a':22, 'b':'ihaha', 'c':'dfdvbfjkv'}]
list(dsearch(lod, a=22))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]
list(dsearch(lod, a=22, b='ihaha'))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]
list(dsearch(lod, a=22, c='fbgval'))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'}]

This is a general way of searching a value in a list of dictionaries:
def search_dictionaries(key, value, list_of_dictionaries):
return [element for element in list_of_dictionaries if element[key] == value]

dicts=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
from collections import defaultdict
dicts_by_name=defaultdict(list)
for d in dicts:
dicts_by_name[d['name']]=d
print dicts_by_name['Tom']
#output
#>>>
#{'age': 10, 'name': 'Tom'}

names = [{'name':'Tom', 'age': 10}, {'name': 'Mark', 'age': 5}, {'name': 'Pam', 'age': 7}]
resultlist = [d for d in names if d.get('name', '') == 'Pam']
first_result = resultlist[0]
This is one way...

You can try this:
''' lst: list of dictionaries '''
lst = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]
search = raw_input("What name: ") #Input name that needs to be searched (say 'Pam')
print [ lst[i] for i in range(len(lst)) if(lst[i]["name"]==search) ][0] #Output
>>> {'age': 7, 'name': 'Pam'}

Put the accepted answer in a function to easy re-use
def get_item(collection, key, target):
return next((item for item in collection if item[key] == target), None)
Or also as a lambda
get_item_lambda = lambda collection, key, target : next((item for item in collection if item[key] == target), None)
Result
key = "name"
target = "Pam"
print(get_item(target_list, key, target))
print(get_item_lambda(target_list, key, target))
#{'name': 'Pam', 'age': 7}
#{'name': 'Pam', 'age': 7}
In case the key may not be in the target dictionary use dict.get and avoid KeyError
def get_item(collection, key, target):
return next((item for item in collection if item.get(key, None) == target), None)
get_item_lambda = lambda collection, key, target : next((item for item in collection if item.get(key, None) == target), None)

My first thought would be that you might want to consider creating a dictionary of these dictionaries ... if, for example, you were going to be searching it more a than small number of times.
However that might be a premature optimization. What would be wrong with:
def get_records(key, store=dict()):
'''Return a list of all records containing name==key from our store
'''
assert key is not None
return [d for d in store if d['name']==key]

Most (if not all) implementations proposed here have two flaws:
They assume only one key to be passed for searching, while it may be interesting to have more for complex dict
They assume all keys passed for searching exist in the dicts, hence they don't deal correctly with KeyError occuring when it is not.
An updated proposition:
def find_first_in_list(objects, **kwargs):
return next((obj for obj in objects if
len(set(obj.keys()).intersection(kwargs.keys())) > 0 and
all([obj[k] == v for k, v in kwargs.items() if k in obj.keys()])),
None)
Maybe not the most pythonic, but at least a bit more failsafe.
Usage:
>>> obj1 = find_first_in_list(list_of_dict, name='Pam', age=7)
>>> obj2 = find_first_in_list(list_of_dict, name='Pam', age=27)
>>> obj3 = find_first_in_list(list_of_dict, name='Pam', address='nowhere')
>>>
>>> print(obj1, obj2, obj3)
{"name": "Pam", "age": 7}, None, {"name": "Pam", "age": 7}
The gist.

Here is a comparison using iterating throuhg list, using filter+lambda or refactoring(if needed or valid to your case) your code to dict of dicts rather than list of dicts
import time
# Build list of dicts
list_of_dicts = list()
for i in range(100000):
list_of_dicts.append({'id': i, 'name': 'Tom'})
# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
dict_of_dicts[i] = {'name': 'Tom'}
# Find the one with ID of 99
# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
if elem['id'] == 99999:
break
lod_tf = time.time()
lod_td = lod_tf - lod_ts
# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts
# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts
print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td
And the output is this:
List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06
Conclusion:
Clearly having a dictionary of dicts is the most efficient way to be able to search in those cases, where you know say you will be searching by id's only.
interestingly using filter is the slowest solution.

I would create a dict of dicts like so:
names = ["Tom", "Mark", "Pam"]
ages = [10, 5, 7]
my_d = {}
for i, j in zip(names, ages):
my_d[i] = {"name": i, "age": j}
or, using exactly the same info as in the posted question:
info_list = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]
my_d = {}
for d in info_list:
my_d[d["name"]] = d
Then you could do my_d["Pam"] and get {"name": "Pam", "age": 7}

Ducks will be a lot faster than a list comprehension or filter. It builds an index on your objects so lookups don't need to scan every item.
pip install ducks
from ducks import Dex
dicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
# Build the index
dex = Dex(dicts, {'name': str, 'age': int})
# Find matching objects
dex[{'name': 'Pam', 'age': 7}]
Result: [{'name': 'Pam', 'age': 7}]

You have to go through all elements of the list. There is not a shortcut!
Unless somewhere else you keep a dictionary of the names pointing to the items of the list, but then you have to take care of the consequences of popping an element from your list.

I found this thread when I was searching for an answer to the same
question. While I realize that it's a late answer, I thought I'd
contribute it in case it's useful to anyone else:
def find_dict_in_list(dicts, default=None, **kwargs):
"""Find first matching :obj:`dict` in :obj:`list`.
:param list dicts: List of dictionaries.
:param dict default: Optional. Default dictionary to return.
Defaults to `None`.
:param **kwargs: `key=value` pairs to match in :obj:`dict`.
:returns: First matching :obj:`dict` from `dicts`.
:rtype: dict
"""
rval = default
for d in dicts:
is_found = False
# Search for keys in dict.
for k, v in kwargs.items():
if d.get(k, None) == v:
is_found = True
else:
is_found = False
break
if is_found:
rval = d
break
return rval
if __name__ == '__main__':
# Tests
dicts = []
keys = 'spam eggs shrubbery knight'.split()
start = 0
for _ in range(4):
dct = {k: v for k, v in zip(keys, range(start, start+4))}
dicts.append(dct)
start += 4
# Find each dict based on 'spam' key only.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam) == dicts[x]
# Find each dict based on 'spam' and 'shrubbery' keys.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+2) == dicts[x]
# Search for one correct key, one incorrect key:
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+1) is None
# Search for non-existent dict.
for x in range(len(dicts)):
spam = x+100
assert find_dict_in_list(dicts, spam=spam) is None

How to get a dictionary of data in column1 as key and column2 as the value?

This question in stack overflow answers how would one get dictionary from tables using pymysql. However, this method outputs column header as keys and its value as data in that column.
Whats the best way to have actual data as keys and values?
For example:
Name |Age
-------------
John |25
Tom |45
Tammy |18
I want
{John:25, Tom:45, Tammy:18}
NOT
[{Name:John},{Age:25},....]
This is what i have right now:
def name2dict(name_list):
name_list_tuple = tuple(name_list)
conn = pymysql.connect()
cur = conn.cursor(pymysql.cursors.DictCursor)
Name2pos = """SELECT Tables.ID, Tables.Position FROM Tables where Tables.Name in %s"""
cur.execute(Name2pos, [name_list_tuple])
query_dict = cur.fetchall()
cur.close()
conn.close()
return query_dict

Don't use a dictionary cursor - instead use the normal one. A simple example slightly adapting your code (assuming it runs okay as can't check), but can certainly be improved:
def name2dict(name_list):
name_list_tuple = tuple(name_list)
conn = pymysql.connect()
cur = conn.cursor()
Name2pos = """SELECT Tables.ID, Tables.Position FROM Tables where Tables.Name in %s"""
cur.execute(Name2pos)
query_dict = dict(cur.fetchall())
cur.close()
conn.close()
return query_dict

It's not clear to me what the structure of your current data is, so I guess I'll just write a separate answer for each one!
d = {
"Name": ["John", "Tom", "Tammy"],
"Age": [25,45,18]
}
new_d = dict(zip(d["Name"], d["Age"]))
print new_d
rows = [
{"Name": "John", "Age": 25},
{"Name": "Tom", "Age": 45},
{"Name": "Tammy", "Age": 18},
]
new_d = {row["Name"]: row["Age"] for row in rows}
print new_d
data = [
{"Name": "John"},
{"Age": 25},
{"Name": "Tom"},
{"Age": 45},
{"Name": "Tammy"},
{"Age": 18},
]
d = {
"Name": [item["Name"] for item in data if "Name" in item],
"Age": [item["Age"] for item in data if "Age" in item],
}
new_d = dict(zip(d["Name"], d["Age"]))
print new_d
In any case, the result is:
{'John': 25, 'Tammy': 18, 'Tom': 45}

Python list of dictionaries search

Assume I have this:
[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
and by searching "Pam" as name, I want to retrieve the related dictionary: {name: "Pam", age: 7}
How to achieve this ?

You can use a generator expression:
>>> dicts = [
... { "name": "Tom", "age": 10 },
... { "name": "Mark", "age": 5 },
... { "name": "Pam", "age": 7 },
... { "name": "Dick", "age": 12 }
... ]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}
If you need to handle the item not being there, then you can do what user Matt suggested in his comment and provide a default using a slightly different API:
next((item for item in dicts if item["name"] == "Pam"), None)
And to find the index of the item, rather than the item itself, you can enumerate() the list:
next((i for i, item in enumerate(dicts) if item["name"] == "Pam"), None)

This looks to me the most pythonic way:
people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]
filter(lambda person: person['name'] == 'Pam', people)
result (returned as a list in Python 2):
[{'age': 7, 'name': 'Pam'}]
Note: In Python 3, a filter object is returned. So the python3 solution would be:
list(filter(lambda person: person['name'] == 'Pam', people))

#Frédéric Hamidi's answer is great. In Python 3.x the syntax for .next() changed slightly. Thus a slight modification:
>>> dicts = [
{ "name": "Tom", "age": 10 },
{ "name": "Mark", "age": 5 },
{ "name": "Pam", "age": 7 },
{ "name": "Dick", "age": 12 }
]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}
As mentioned in the comments by #Matt, you can add a default value as such:
>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

You can use a list comprehension:
def search(name, people):
return [element for element in people if element['name'] == name]

I tested various methods to go through a list of dictionaries and return the dictionaries where key x has a certain value.
Results:
Speed: list comprehension > generator expression >> normal list iteration >>> filter.
All scale linear with the number of dicts in the list (10x list size -> 10x time).
The keys per dictionary does not affect speed significantly for large amounts (thousands) of keys. Please see this graph I calculated: https://imgur.com/a/quQzv (method names see below).
All tests done with Python 3.6.4, W7x64.
from random import randint
from timeit import timeit
list_dicts = []
for _ in range(1000): # number of dicts in the list
dict_tmp = {}
for i in range(10): # number of keys for each dict
dict_tmp[f"key{i}"] = randint(0,50)
list_dicts.append( dict_tmp )
def a():
# normal iteration over all elements
for dict_ in list_dicts:
if dict_["key3"] == 20:
pass
def b():
# use 'generator'
for dict_ in (x for x in list_dicts if x["key3"] == 20):
pass
def c():
# use 'list'
for dict_ in [x for x in list_dicts if x["key3"] == 20]:
pass
def d():
# use 'filter'
for dict_ in filter(lambda x: x['key3'] == 20, list_dicts):
pass
Results:
1.7303 # normal list iteration
1.3849 # generator expression
1.3158 # list comprehension
7.7848 # filter

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]
def search(name):
for p in people:
if p['name'] == name:
return p
search("Pam")

Have you ever tried out the pandas package? It's perfect for this kind of search task and optimized too.
import pandas as pd
listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)
# The pandas dataframe allows you to pick out specific values like so:
df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]
# Alternate syntax, same thing
df2 = df[ (df.name == 'Pam') & (df.age == 7) ]
I've added a little bit of benchmarking below to illustrate pandas' faster runtimes on a larger scale i.e. 100k+ entries:
setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'
setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'
method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'
import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))
t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))
#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

To add just a tiny bit to #FrédéricHamidi.
In case you are not sure a key is in the the list of dicts, something like this would help:
next((item for item in dicts if item.get("name") and item["name"] == "Pam"), None)

Simply using list comprehension:
[i for i in dct if i['name'] == 'Pam'][0]
Sample code:
dct = [
{'name': 'Tom', 'age': 10},
{'name': 'Mark', 'age': 5},
{'name': 'Pam', 'age': 7}
]
print([i for i in dct if i['name'] == 'Pam'][0])
> {'age': 7, 'name': 'Pam'}

You can achieve this with the usage of filter and next methods in Python.
filter method filters the given sequence and returns an iterator.
next method accepts an iterator and returns the next element in the list.
So you can find the element by,
my_dict = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
next(filter(lambda obj: obj.get('name') == 'Pam', my_dict), None)
and the output is,
{'name': 'Pam', 'age': 7}
Note: The above code will return None incase if the name we are searching is not found.

One simple way using list comprehensions is , if l is the list
l = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
then
[d['age'] for d in l if d['name']=='Tom']

def dsearch(lod, **kw):
return filter(lambda i: all((i[k] == v for (k, v) in kw.items())), lod)
lod=[{'a':33, 'b':'test2', 'c':'a.ing333'},
{'a':22, 'b':'ihaha', 'c':'fbgval'},
{'a':33, 'b':'TEst1', 'c':'s.ing123'},
{'a':22, 'b':'ihaha', 'c':'dfdvbfjkv'}]
list(dsearch(lod, a=22))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]
list(dsearch(lod, a=22, b='ihaha'))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'},
{'a': 22, 'b': 'ihaha', 'c': 'dfdvbfjkv'}]
list(dsearch(lod, a=22, c='fbgval'))
[{'a': 22, 'b': 'ihaha', 'c': 'fbgval'}]

This is a general way of searching a value in a list of dictionaries:
def search_dictionaries(key, value, list_of_dictionaries):
return [element for element in list_of_dictionaries if element[key] == value]

dicts=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
from collections import defaultdict
dicts_by_name=defaultdict(list)
for d in dicts:
dicts_by_name[d['name']]=d
print dicts_by_name['Tom']
#output
#>>>
#{'age': 10, 'name': 'Tom'}

names = [{'name':'Tom', 'age': 10}, {'name': 'Mark', 'age': 5}, {'name': 'Pam', 'age': 7}]
resultlist = [d for d in names if d.get('name', '') == 'Pam']
first_result = resultlist[0]
This is one way...

You can try this:
''' lst: list of dictionaries '''
lst = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]
search = raw_input("What name: ") #Input name that needs to be searched (say 'Pam')
print [ lst[i] for i in range(len(lst)) if(lst[i]["name"]==search) ][0] #Output
>>> {'age': 7, 'name': 'Pam'}

Put the accepted answer in a function to easy re-use
def get_item(collection, key, target):
return next((item for item in collection if item[key] == target), None)
Or also as a lambda
get_item_lambda = lambda collection, key, target : next((item for item in collection if item[key] == target), None)
Result
key = "name"
target = "Pam"
print(get_item(target_list, key, target))
print(get_item_lambda(target_list, key, target))
#{'name': 'Pam', 'age': 7}
#{'name': 'Pam', 'age': 7}
In case the key may not be in the target dictionary use dict.get and avoid KeyError
def get_item(collection, key, target):
return next((item for item in collection if item.get(key, None) == target), None)
get_item_lambda = lambda collection, key, target : next((item for item in collection if item.get(key, None) == target), None)

My first thought would be that you might want to consider creating a dictionary of these dictionaries ... if, for example, you were going to be searching it more a than small number of times.
However that might be a premature optimization. What would be wrong with:
def get_records(key, store=dict()):
'''Return a list of all records containing name==key from our store
'''
assert key is not None
return [d for d in store if d['name']==key]

Most (if not all) implementations proposed here have two flaws:
They assume only one key to be passed for searching, while it may be interesting to have more for complex dict
They assume all keys passed for searching exist in the dicts, hence they don't deal correctly with KeyError occuring when it is not.
An updated proposition:
def find_first_in_list(objects, **kwargs):
return next((obj for obj in objects if
len(set(obj.keys()).intersection(kwargs.keys())) > 0 and
all([obj[k] == v for k, v in kwargs.items() if k in obj.keys()])),
None)
Maybe not the most pythonic, but at least a bit more failsafe.
Usage:
>>> obj1 = find_first_in_list(list_of_dict, name='Pam', age=7)
>>> obj2 = find_first_in_list(list_of_dict, name='Pam', age=27)
>>> obj3 = find_first_in_list(list_of_dict, name='Pam', address='nowhere')
>>>
>>> print(obj1, obj2, obj3)
{"name": "Pam", "age": 7}, None, {"name": "Pam", "age": 7}
The gist.

Here is a comparison using iterating throuhg list, using filter+lambda or refactoring(if needed or valid to your case) your code to dict of dicts rather than list of dicts
import time
# Build list of dicts
list_of_dicts = list()
for i in range(100000):
list_of_dicts.append({'id': i, 'name': 'Tom'})
# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
dict_of_dicts[i] = {'name': 'Tom'}
# Find the one with ID of 99
# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
if elem['id'] == 99999:
break
lod_tf = time.time()
lod_td = lod_tf - lod_ts
# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts
# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts
print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td
And the output is this:
List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06
Conclusion:
Clearly having a dictionary of dicts is the most efficient way to be able to search in those cases, where you know say you will be searching by id's only.
interestingly using filter is the slowest solution.

I would create a dict of dicts like so:
names = ["Tom", "Mark", "Pam"]
ages = [10, 5, 7]
my_d = {}
for i, j in zip(names, ages):
my_d[i] = {"name": i, "age": j}
or, using exactly the same info as in the posted question:
info_list = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]
my_d = {}
for d in info_list:
my_d[d["name"]] = d
Then you could do my_d["Pam"] and get {"name": "Pam", "age": 7}

Ducks will be a lot faster than a list comprehension or filter. It builds an index on your objects so lookups don't need to scan every item.
pip install ducks
from ducks import Dex
dicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]
# Build the index
dex = Dex(dicts, {'name': str, 'age': int})
# Find matching objects
dex[{'name': 'Pam', 'age': 7}]
Result: [{'name': 'Pam', 'age': 7}]

You have to go through all elements of the list. There is not a shortcut!
Unless somewhere else you keep a dictionary of the names pointing to the items of the list, but then you have to take care of the consequences of popping an element from your list.

I found this thread when I was searching for an answer to the same
question. While I realize that it's a late answer, I thought I'd
contribute it in case it's useful to anyone else:
def find_dict_in_list(dicts, default=None, **kwargs):
"""Find first matching :obj:`dict` in :obj:`list`.
:param list dicts: List of dictionaries.
:param dict default: Optional. Default dictionary to return.
Defaults to `None`.
:param **kwargs: `key=value` pairs to match in :obj:`dict`.
:returns: First matching :obj:`dict` from `dicts`.
:rtype: dict
"""
rval = default
for d in dicts:
is_found = False
# Search for keys in dict.
for k, v in kwargs.items():
if d.get(k, None) == v:
is_found = True
else:
is_found = False
break
if is_found:
rval = d
break
return rval
if __name__ == '__main__':
# Tests
dicts = []
keys = 'spam eggs shrubbery knight'.split()
start = 0
for _ in range(4):
dct = {k: v for k, v in zip(keys, range(start, start+4))}
dicts.append(dct)
start += 4
# Find each dict based on 'spam' key only.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam) == dicts[x]
# Find each dict based on 'spam' and 'shrubbery' keys.
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+2) == dicts[x]
# Search for one correct key, one incorrect key:
for x in range(len(dicts)):
spam = x*4
assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+1) is None
# Search for non-existent dict.
for x in range(len(dicts)):
spam = x+100
assert find_dict_in_list(dicts, spam=spam) is None

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: merging 2 dictionaries (from csv file) with same key AND values - python

You mentioned Pandas in a comment. If that is an option, then you could try: import pandas as pd df_A = pd.read_csv("A.csv") df_B = pd.read_csv("B.csv") result = df_A.merge( df_B.groupby("Age")["Class"].agg(list), on="Age", how="left" ).to_dict("records")

Related

working with list of dictionaries in python

Converting CSV to Hierarchical JSON output

Search nested dictionary values in a list and return whole nested dictionary that contains searched value [duplicate]

How to get a dictionary of data in column1 as key and column2 as the value?

Python list of dictionaries search

Categories

Resources