Merging 2 list of dicts based on common values - python

So I have 2 list of dicts which are as follows:
list1 = [
{'name':'john',
'gender':'male',
'grade': 'third'
},
{'name':'cathy',
'gender':'female',
'grade':'second'
},
]
list2 = [
{'name':'john',
'physics':95,
'chemistry':89
},
{'name':'cathy',
'physics':78,
'chemistry':69
},
]
The output list i need is as follows:
final_list = [
{'name':'john',
'gender':'male',
'grade':'third'
'marks': {'physics':95, 'chemistry': 89}
},
{'name':'cathy',
'gender':'female'
'grade':'second'
'marks': {'physics':78, 'chemistry': 69}
},
]
First i tried with iteration as follows:
final_list = []
for item1 in list1:
for item2 in list2:
if item1['name'] == item2['name']:
temp = dict(item_2)
temp.pop('name')
final_result.append(dict(name=item_1['name'], **temp))
However,this does not give me the desired result..I also tried pandas..limited experience there..
>>> import pandas as pd
>>> df1 = pd.DataFrame(list1)
>>> df2 = pd.DataFrame(list2)
>>> result = pd.merge(df1, df2, on=['name'])
However,i am clueless how to get the data back to the original format i need it in..Any help

You can first merge both dataframes
In [144]: df = pd.DataFrame(list1).merge(pd.DataFrame(list2))
Which would look like,
In [145]: df
Out[145]:
gender grade name chemistry physics
0 male third john 89 95
1 female second cathy 69 78
Then create a marks columns as a dict
In [146]: df['marks'] = df.apply(lambda x: [x[['chemistry', 'physics']].to_dict()], axis=1)
In [147]: df
Out[147]:
gender grade name chemistry physics \
0 male third john 89 95
1 female second cathy 69 78
marks
0 [{u'chemistry': 89, u'physics': 95}]
1 [{u'chemistry': 69, u'physics': 78}]
And, use to_dict(orient='records') method of selected columns of dataframe
In [148]: df[['name', 'gender', 'grade', 'marks']].to_dict(orient='records')
Out[148]:
[{'gender': 'male',
'grade': 'third',
'marks': [{'chemistry': 89L, 'physics': 95L}],
'name': 'john'},
{'gender': 'female',
'grade': 'second',
'marks': [{'chemistry': 69L, 'physics': 78L}],
'name': 'cathy'}]

Using your pandas approach, you can call
result.to_dict(orient='records')
to get it back as a list of dictionaries. It won't put marks in as a sub-field though, since there's nothing telling it to do that. physics and chemistry will just be fields on the same level as the rest.
You may also be having problems because your name is 'cathy' in the first list and 'kathy' in the second, which naturally won't get merged.

create a function that will add a marks column , this columns should contain a dictionary of physics and chemistry marks
def create_marks(df):
df['marks'] = { 'chemistry' : df['chemistry'] , 'physics' : df['physics'] }
return df
result_with_marks = result.apply( create_marks , axis = 1)
Out[19]:
gender grade name chemistry physics marks
male third john 89 95 {u'chemistry': 89, u'physics': 95}
female second cathy 69 78 {u'chemistry': 69, u'physics': 78}
then convert it to your desired result as follows
result_with_marks.drop( ['chemistry' , 'physics'], axis = 1).to_dict(orient = 'records')
Out[20]:
[{'gender': 'male',
'grade': 'third',
'marks': {'chemistry': 89L, 'physics': 95L},
'name': 'john'},
{'gender': 'female',
'grade': 'second',
'marks': {'chemistry': 69L, 'physics': 78L},
'name': 'cathy'}]

Considering you want a list of dicts as output, you can easily do what you want without pandas, use a dict to store all the info using the names as the outer keys, doing one pass over each list not like the O(n^2) double loops in your own code:
out = {d["name"]: d for d in list1}
for d in list2:
out[d.pop("name")]["marks"] = d
from pprint import pprint as pp
pp(list(out.values()))
Output:
[{'gender': 'female',
'grade': 'second',
'marks': {'chemistry': 69, 'physics': 78},
'name': 'cathy'},
{'gender': 'male',
'grade': 'third',
'marks': {'chemistry': 89, 'physics': 95},
'name': 'john'}]
That reuses the dicts in your lists, if you wanted to create new dicts:
out = {d["name"]: d.copy() for d in list1}
for d in list2:
k = d.pop("name")
out[k]["marks"] = d.copy()
from pprint import pprint as pp
pp(list(out.values()))
The output is the same:
[{'gender': 'female',
'grade': 'second',
'marks': {'chemistry': 69, 'physics': 78},
'name': 'cathy'},
{'gender': 'male',
'grade': 'third',
'marks': {'chemistry': 89, 'physics': 95},
'name': 'john'}]

Related

Convert Pandas Dataframe to nested json-keep 2 columns

I have a DF with the following columns and data:
I hope it could be converted to two columns, studentid and info, with the following format.
the dataset is
"""
studentid course teacher grade rank
1 math A 91 1
1 history B 79 2
2 math A 88 2
2 history B 83 1
3 math A 85 3
3 history B 76 3
and the desire output is
studentid info
1 "{""math"":[{""teacher"":""A"",""grade"":91,""rank"":1}],
""history"":[{""teacher"":""B"",""grade"":79,""rank"":2}]}"
2 "{""math"":[{""teacher"":""A"",""grade"":88,""rank"":2}],
""history"":[{""teacher"":""B"",""grade"":83,""rank"":1}]}"
3 "{""math"":[{""teacher"":""A"",""grade"":85,""rank"":3}],
""history"":[{""teacher"":""B"",""grade"":76,""rank"":3}]}"
You don't really need groupby() and the single sub-dictionaries shouldn't really be in a list, but as value's for the nested dict. After setting the columns you want as index, with df.to_dict() you can achieve the desired output:
df = df.set_index(['studentid','course'])
df.to_dict(orient='index')
Outputs:
{(1, 'math'): {'teacher': 'A', 'grade': 91, 'rank': 1},
(1, 'history'): {'teacher': 'B', 'grade': 79, 'rank': 2},
(2, 'math'): {'teacher': 'A', 'grade': 88, 'rank': 2},
(2, 'history'): {'teacher': 'B', 'grade': 83, 'rank': 1},
(3, 'math'): {'teacher': 'A', 'grade': 85, 'rank': 3},
(3, 'history'): {'teacher': 'B', 'grade': 76, 'rank': 3}}
Considering that the initial dataframe is df, there are various options, depending on the exact desired output.
If one wants the info column to be a dictionary of lists, this will do the work
df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='list')).reset_index(name='info')
[Out]:
studentid info
0 1 {'course': ['math', 'history'], 'teacher': ['A...
1 2 {'course': ['math', 'history'], 'teacher': ['A...
2 3 {'course': ['math', 'history'], 'teacher': ['A...
If one wants a list of dictionaries, then do the following
df_new = df.groupby('studentid').apply(lambda x: x.drop('studentid', axis=1).to_dict(orient='records')).reset_index(name='info')
[Out]:
studentid info
0 1 [{'course': 'math', 'teacher': 'A', 'grade': 9...
1 2 [{'course': 'math', 'teacher': 'A', 'grade': 8...
2 3 [{'course': 'math', 'teacher': 'A', 'grade': 8...

Python Pandas to group values in 2 columns

A data frame like below. the names are in 5 groups, linking by the common in column A.
I want to group the names. I tried:
import pandas as pd
data = {'A': ["James","James","James","Edward","Edward","Thomas","Thomas","Jason","Jason","Jason","Brian","Brian"],
'B' : ["John","Michael","William","David","Joseph","Christopher","Daniel","George","Kenneth","Steven","Ronald","Anthony"]}
df = pd.DataFrame(data)
df_1 = df.groupby('A')['B'].apply(list)
df_1 = df_1.to_frame().reset_index()
for index, row in df_1.iterrows():
print (row['A'], row['B'])
the outputs are:
('Brian', ['Ronald', 'Anthony'])
('Edward', ['David', 'Joseph'])
('James', ['John', 'Michael', 'William'])
('Jason', ['George', 'Kenneth', 'Steven'])
('Thomas', ['Christopher', 'Daniel'])
but I want one list for each group (it would be even better if there's an automatic way to assign a variable to each list), like:
['Brian', 'Ronald', 'Anthony']
['Edward', 'David', 'Joseph']
['James', 'John', 'Michael', 'William']
['Jason', 'George', 'Kenneth', 'Steven']
['Thomas', 'Christopher', 'Daniel']
I tried row['B'].append(row['A']) but it returns None.
What's the right way to group them? thank you.
You can add values of A grouping column in GroupBy.apply with .name attribute:
s = df.groupby('A')['B'].apply(lambda x: [x.name] + list(x))
print (s)
A
Brian [Brian, Ronald, Anthony]
Edward [Edward, David, Joseph]
James [James, John, Michael, William]
Jason [Jason, George, Kenneth, Steven]
Thomas [Thomas, Christopher, Daniel]
Name: B, dtype: object
You can try this. Use pd.Series.tolist()
for k,g in df.groupby('A')['B']:
print([k]+g.tolist())
['Brian', 'Ronald', 'Anthony']
['Edward', 'David', 'Joseph']
['James', 'John', 'Michael', 'William']
['Jason', 'George', 'Kenneth', 'Steven']
['Thomas', 'Christopher', 'Daniel']
The reason you got None as output is list.append returns None it mutates the list in-place.
try the following:
import pandas as pd
data = {'A': ["James","James","James","Edward","Edward","Thomas","Thomas","Jason","Jason","Jason","Brian","Brian"],
'B' : ["John","Michael","William","David","Joseph","Christopher","Daniel","George","Kenneth","Steven","Ronald","Anthony"]}
df = pd.DataFrame(data)
#display(df)
df_1 = df.groupby(list('A'))['B'].apply(list)
df_1 = df_1.to_frame().reset_index()
for index, row in df_1.iterrows():
''' The value of column A is not a list,
so need to split the string and store in to a list and then concatenate with column B '''
print(row['A'].split("delimiter") + row['B'])
output:
['Brian', 'Ronald', 'Anthony']
['Edward', 'David', 'Joseph']
['James', 'John', 'Michael', 'William']
['Jason', 'George', 'Kenneth', 'Steven']
['Thomas', 'Christopher', 'Daniel']

Get values from a list of dictionaries in a Pandas Dataframe

Okay, so I have a dataframe. Each element of column 'z' is a list of dictionaries.
For example, row two of column 'z' looks like this:
[ {'name': 'Tom', 'hw': [180, 79]},
{'name': 'Mark', 'hw': [119, 65]} ]
I would like it to just contain the 'name' values, in this case the element would be Tom and Mark without the 'hw' values.
I've tried converting it into a list, then removing every second element, but I lost which values came from the same row. Not every row has the same number of elements in it, some have 2 names, some might have 4.
One way using list comprehension with dict.get:
Example
df = pd.DataFrame({'z': [[{'name': 'Tom', 'hw': [180, 79]},
{'name': 'Mark', 'hw': [119, 65]}]]})
df['name'] = [[d.get('name') for d in x] for x in df['z']]
[out]
z name
0 [{'name': 'Tom', 'hw': [180, 79]}, {'name': 'M... [Tom, Mark]
Let us use pandas get using series.str.get
df['name']=df.col.str.get('name')
df
col name
0 {'name': 'Tom', 'hw': [180, 79]} Tom
1 {'name': 'Mark', 'hw': [119, 65]} Mark

Getting TypeError when trying to retrieve values from keys in a list of dictionaries

I have an array of dictionaries in a pandas DataFrame:
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 88, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 77, 'name': 'Horror'}]
I am trying to get all the names from a single row into a simple list of Strings, like: "Horror, family, drama" etc for each row in the dataset.
I tried this code but I am getting the error: string indices must be integers
for y in df:
names = [x['name'] for x in y]
Any help is appriciated
Iterating over a data-frame iterates over the names of the columns, `:
In [15]: df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
In [16]: df
Out[16]:
a b
0 1 4
1 2 5
2 3 6
In [17]: for x in df:
...: print(x)
...:
a
b
It is like a dict that would iterate over it's keys.
You need something like:
df['your_column'].apply(lambda x: [d['name'] for d in x])
IIUC, this is dict not a list. you should using .get
[[y.get('name') for y in x ]for x in df['your columns']]
Out[578]:
[['Animation', 'Comedy', 'Family'],
['Adventure', 'Fantasy', 'Family'],
['Romance', 'Horror']]
Convert str
import ast
df.a=df.a.apply(ast.literal_eval)

Python: Convert multiple list into an array of dictionary

Imagine that you have the following list.
name = ['bob', 'kate', 'john']
age = [35, 12, 57]
gender = ["Male", "Female", "Male"]
How do you convert it to an array of dictionary?
[
{
"name": "bob"
"age": 35
"gender": "Male"
},
{
"name": "kate"
"age": 12
"gender": "Female"
},
{
"name": "john"
"age": 57
"gender": "Male"
}
]
A generic method which works for any number of lists with customizable field names
import pprint
def make_complex(**kwargs):
return [dict(zip(kwargs.keys(), a)) for a in zip(*kwargs.values())]
name = ['bob', 'kate', 'john']
age = [35, 12, 57]
gender = ["Male", "Female", "Male"]
l = make_complex(name=name, age=age, gender=gender)
pprint.pprint(l)
l = make_complex(user=name, year=age, sex=gender)
pprint.pprint(l)
output:
[{'age': 35, 'gender': 'Male', 'name': 'bob'},
{'age': 12, 'gender': 'Female', 'name': 'kate'},
{'age': 57, 'gender': 'Male', 'name': 'john'}]
[{'sex': 'Male', 'user': 'bob', 'year': 35},
{'sex': 'Female', 'user': 'kate', 'year': 12},
{'sex': 'Male', 'user': 'john', 'year': 57}]
Using zip ,List comprehension
Code:
name = ['bob', 'kate', 'john']
age = [35, 12, 57]
gender = ["Male", "Female", "Male"]
dic= [ {"name":val[0], "age":val[1], "gender":val[2]} for val in zip(name, age, gender)]
Output:
[{'name':'bob','age':35,'gender':'Male'},
{'name':'kate','age':12,'gender':'Female'},
{'name':'john','age':57,'gender':'Male'}]
Using a simple loop it would look something like:
name = ['bob', 'kate', 'john']
age = [35, 12, 57]
gender = ["Male", "Female", "Male"]
list=[]
for i in range(len(name)):
temp={}
temp['name']=name[i]
temp['age']=age[i]
temp['gender']=gender[i]
list.append(temp)
Using a list comprehension and itertools
import itertools
d = [{'name': n, 'age': a, 'gender': g} for n, a, g in itertools.izip(name, age, gender)]
Use list comprehension.
In [3]: [{"name":n,"age":a,"gender":g} for n,a,g in zip(name, age, gender)]
Out[3]:
[{'age': 35, 'gender': 'Male', 'name': 'bob'},
{'age': 12, 'gender': 'Female', 'name': 'kate'},
{'age': 57, 'gender': 'Male', 'name': 'john'}]
or,
In [5]: [dict(zip(['name','age','gender'], t)) for t in zip(name, age, gender)]
Out[5]:
[{'age': 35, 'gender': 'Male', 'name': 'bob'},
{'age': 12, 'gender': 'Female', 'name': 'kate'},
{'age': 57, 'gender': 'Male', 'name': 'john'}]
Go for this.
name = ['bob', 'kate', 'john']
age = [35, 12, 57]
gender = ["Male", "Female", "Male"]
keys = [name, age, gender] #If there are more data to be added just change this one place
def get_var_name(var):
for k, v in list(globals().iteritems()):
if v is var:
return k
d = []
for i in range(len(keys[0])):
d.append({})
for key in keys:
d[i][get_var_name(key)] = key[i]
print d
Or use dict comprehension to avoid inner loop
d = []
for i in range(len(name)):
d.append({get_var_name(key):key[i] for key in keys})
print d
To make it one liner go combining dict comprehension inner and list comprehension outer
print [{get_var_name(key):key[i] for key in keys} for i in range(len(keys[0]))]

Categories

Resources