iterate over data frame rows in pandas - python

Iterate over the rows in pandas and get the list of objects from rows
import pandas as pd;
df=pd.read_json("inputfile.txt")
data
0 {'M': {'1': 'data', '2': 'data2'}}
1 {'M': {'3': '555', '5': '3333'}}
data=[]
for row in df.iterrows():
d = [{k1+k2:v2 for k1,v1 in x.items() for k2,v2 in v1.items()} for x in row]
data.append(d)
print (data)
getting output like this
[[{'M1': 'data', 'M2': 'data2'}], [{'M3': '555', 'M5': '3333'}]]
need the output like this
[{'M1': 'data', 'M2': 'data2'}, {'M3': '555', 'M5': '3333'}]

.extend() - extends the list by adding all items of a list (passed as an argument) to the end.
Ex.
import pandas as pd
data = {'data':[{'M': {'1': 'data', '2': 'data2'}},{'M': {'3': '555', '5': '3333'}}]}
df = pd.DataFrame(data)
print(df)
result=[]
for row in df.iterrows():
x = [{"{0}{1}".format(k,k1) : v1 for k,v in x[1].items() for k1,v1 in v.items()} for x in row[1].items() ]
result.extend(x)
print(result)
Or single-line list comprehension
x = [{"{0}{1}".format(k,k1) : v1 for k,v in x[1].items() for k1,v1 in v.items()} for row in df.iterrows()
for x in row[1].items() ]
print(x)
O/P:
data
0 {'M': {'1': 'data', '2': 'data2'}}
1 {'M': {'3': '555', '5': '3333'}}
[{'M1': 'data', 'M2': 'data2'}, {'M3': '555', 'M5': '3333'}]

By doing:
d = [{k1+k2:v2 for k1,v1 in x.items() for k2,v2 in v1.items()} for x in row]
You are creating a list. And appending it to data.
Modified code:
import pandas as pd
df = pd.DataFrame({'data': [{'M': {'1': 'data', '2': 'data2'}}, {'M': {'3': '555', '5': '3333'}}]})
data = []
for row in df.iterrows():
d = {k1+k2:v2 for x in row[1] for k1,v1 in x.items() for k2,v2 in v1.items()}
data.append(d)
print(data)

Related

Combining multiple nested dictionaries in python

I have multiple nested dictionaries with different levels and I would like to combine them on the same key. Here, I am sharing with 3 examples such as:
dict_1={'D': {'D': '1','B': '2','A': '3'},'A': {'A': '5','J': '6'}}
dict_2={'D': {'D': '7', 'B': '8', 'C': '9'},'A': {'A': '12', 'C':'13'}}
dict_3={'D': {'test1': '14','test2': '3'},'B': {'test1': '21','test2': '16'},'A': {'test1': '3','test2': '2'},'J': {'test1': '15','test2': '3'}, 'C':{'test1': '44','test2': '33'}}
I want to combine these 3 as by adding 'dict_3' keys (adding the information from dict_3) and values to the combination of 'dict_1' and 'dict_2' for each key:
main_dict={
'D':
{'D':{'dict_1_value':'1', 'dict_2_value':'7', 'test1': '14', 'test2': '3'},
'B':{'dict_1_value':'2', 'dict_2_value':'8', 'test1': '21', 'test2': '16'},
'A':{'dict_1_value':'3', 'test1': '3', 'test2': '2'},
'C':{'dict_2_value':'9', 'test1': '44', 'test2': '33'}},
'A':
{'A':{'dict_1_value':'5', 'dict_2_value':'12', 'test1': '3', 'test2': '2'},
'J':{'dict_1_value':'6', 'test1': '15', 'test2': '3'},
'C':{'dict_2_value':'13', 'test1': '44', 'test2': '33'}}
}
At first, I have tried to combine dict_1 and dict_2 but I am overwriting the values from dict_1 when I tried such as {k: v | dict_2[k] for k, v in dict_1.items()} or dict(**dict_1,**dict_2). Moreover, I don't know how I can add dict_3 by adding key name as 'dict_1_value' or 'dict_2_value'.
Is there any way to accomplish main_dict?
all_keys = set(dict_1.keys()).union(dict_2.keys())
temp_1 = {key: {k: {'dict_1_value': v} for k, v in sub.items()} for key, sub in dict_1.items()}
temp_2 = {key: {k: {'dict_2_value': v} for k, v in sub.items()} for key, sub in dict_2.items()}
combined = {}
for key in all_keys:
sub_1 = temp_1.get(key, {})
sub_2 = temp_2.get(key, {})
sub_keys = set(sub_1.keys()).union(sub_2.keys())
combined[key] = {k: sub_1.get(k, {}) | sub_2.get(k, {}) for k in sub_keys}
Now there are 2 options:
1
Dictionary comprehension - the new dictionary is constructed from scratch:
main_dict = {key: {k: sub[k] | dict_3.get(k, {})
for k, v in sub.items()}
for key, sub in combined.items()}
2
Loop - items of the existing dictionary are just updated:
for key, sub in combined.items():
for k, v in sub.items():
v.update(dict_3.get(k, {}))
main_dict = combined

How to get individual fields in list of dictionary

I have a list of dictionaries called dictList that has data like so:
[{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}].
I am trying to create a new dictionary that uses the id as the key and total as the value.
So have tried this:
keys = [d['id'] for d in dictList]
values = [d['total'] for d in dictList]
new_dict[str(keys)]= values
However the output is: {"['5', '5']": [39, 43]}
I am not sure what is going on, I am just trying to get the id and the respective total like 5, 39 and 5, 43 in to new_dict.
EDIT:
Please note that dictList contains all the products with ID 5. There are other fields, but I didn't include them.
One approach:
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = {}
for d in data:
key = d["id"]
if key not in res:
res[key] = 0
res[key] += int(d["total"])
print(res)
Output
{'5': 82}
Alternative using collections.defaultdict:
from collections import defaultdict
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = defaultdict(int)
for d in data:
key = d["id"]
res[key] += int(d["total"])
print(res)
Output
defaultdict(<class 'int'>, {'5': 82})
Use sorted and itertools.groupby to group by the 'id' key of each list element:
import itertools
dictList = [{'id': '5', 'total': '39'}, {'id': '10', 'total': '10'},
{'id': '5', 'total': '43'}, {'id': '10', 'total': '22'}]
groups = itertools.groupby(sorted(dictList, key=lambda item: item['id'])
, key=lambda item: item['id'])
Next, take the sum of each group:
product_totals = {
key: sum(int(item['total']) for item in grp)
for key, grp in groups
}
Which gives:
{'10': 32, '5': 82}
If you have lots of such entries, you could consider using pandas to create a dataframe. Pandas has vectorized methods that help you crunch numbers faster. The idea behind finding the sum of totals is the same, except in this case we don't need to sort because pandas.groupby takes care of that for us
>>> import pandas as pd
>>> df = pd.DataFrame(dictList)
>>> df['total'] = df['total'].astype(int)
>>> df
id total
0 5 39
1 10 10
2 5 43
3 10 22
>>> df.groupby('id').total.sum()
id
10 32
5 82
Name: total, dtype: int32
>>> df.groupby('id').total.sum().as_dict()
{'10': 32, '5': 82}
Although I'm not sure what you are trying to do, try this:
for d in dictlist:
if new_dict[d["id"]]:
new_dict[d["id"]] += d["total"]
else:
new_dict[d["id"]] = d["total"]

Create dictionary from 2 columns of Dataframe

I have a dataframe:
df = pd.DataFrame({
'ID': ['1', '4', '4', '3', '3', '3'],
'club': ['arts', 'math', 'theatre', 'poetry', 'dance', 'cricket']
})
Note: Both the columns of the data frame can have repeated values.
I want to create a dictionary of dictionaries for every rank with its unique club names.
It should look like this:
{
{'1':'arts'}, {'4':'math','theatre'}, {'3':'poetry','dance','cricket'}
}
Kindly help me with this
Try groupby() and then to_dict():
grouped = df.groupby("ID")["club"].apply(set)
print(grouped)
> ID
1 {arts}
3 {cricket, poetry, dance}
4 {math, theatre}
grouped_dict = grouped.to_dict()
print(grouped_dict)
> {'1': {'arts'}, '3': {'cricket', 'poetry', 'dance'}, '4': {'math', 'theatre'}}
Edit:
Changed to .apply(set) to get sets.
You can use a defaultdict:
from collections import defaultdict
d = defaultdict(set)
for k,v in zip(df['ID'], df['club']):
d[k].add(v)
dict(d)
output:
{'1': {'arts'}, '4': {'math', 'theatre'}, '3': {'cricket', 'dance', 'poetry'}}
or for a format similar to the provided output:
[{k:v} for k,v in d.items()]
output:
[{'1': {'arts'}},
{'4': {'math', 'theatre'}},
{'3': {'cricket', 'dance', 'poetry'}}]

Switch key and value in a dictionary of sets

I have dictionary something like:
d1 = {'0': {'a'}, '1': {'b'}, '2': {'c', 'd'}, '3': {'E','F','G'}}
and I want result like this
d2 = {'a': '0', 'b': '1', 'c': '2', 'd': '2', 'E': '3', 'F': '3', 'G': '3'}
so I tried
d2 = dict ((v, k) for k, v in d1.items())
but value is surrounded by set{}, so it didn't work well...
is there any way that I can fix it?
You could use a dictionary comprehension:
{v:k for k,vals in d1.items() for v in vals}
# {'a': '0', 'b': '1', 'c': '2', 'd': '2', 'E': '3', 'F': '3', 'G': '3'}
Note that you need an extra level of iteration over the values in each key here to get a flat dictionary.
Another dict comprehension:
>>> {v: k for k in d1 for v in d1[k]}
{'a': '0', 'b': '1', 'c': '2', 'd': '2', 'E': '3', 'F': '3', 'G': '3'}
Benchmark comparison with yatu's:
from timeit import repeat
setup = "d1 = {'0': {'a'}, '1': {'b'}, '2': {'c', 'd'}, '3': {'E','F','G'}}"
yatu = "{v:k for k,vals in d1.items() for v in vals}"
heap = "{v:k for k in d1 for v in d1[k]}"
for _ in range(3):
print('yatu', min(repeat(yatu, setup)))
print('heap', min(repeat(heap, setup)))
print()
Results:
yatu 1.4274586000000227
heap 1.4059823000000051
yatu 1.4562267999999676
heap 1.3701727999999775
yatu 1.4313863999999512
heap 1.3878657000000203
Another benchmark, with a million keys/values:
setup = "d1 = {k: {k+1, k+2} for k in range(0, 10**6, 3)}"
for _ in range(3):
print('yatu', min(repeat(yatu, setup, number=10)))
print('heap', min(repeat(heap, setup, number=10)))
print()
yatu 1.071519999999964
heap 1.1391495000000305
yatu 1.0880677000000105
heap 1.1534022000000732
yatu 1.0944767999999385
heap 1.1526202000000012
Here's another possible solution to the given problem:
def flatten_dictionary(dct):
d = {}
for k, st_values in dct.items():
for v in st_values:
d[v] = k
return d
if __name__ == '__main__':
d1 = {'0': {'a'}, '1': {'b'}, '2': {'c', 'd'}, '3': {'E', 'F', 'G'}}
d2 = flatten_dictionary(d1)
print(d2)

Convert dictionary to python dataframe which has key value pair

I have my dictionary as
{'id': '6576_926_1',
'name': 'xyz',
'm': 926,
0: {'id': '2896_926_2',
'name': 'lmn',
'm': 926},
1: {'id': '23_926_3',
'name': 'abc',
'm': 928}}
And I want to convert it into dataframe like
Id Name M
6576_926_1 Xyz 926
2896_926_2 Lmn 926
23_926_3 Abc 928
I am fine even if first row is not available as it doesn't have index. There are around 1.3 MN records and so speed is very important. I tried using a for loop and append statement and it takes forever
As you have mentioned that first row is not mandatory for you. So, here i've tried this. Hope this will solve your problem
import pandas as pd
lis = []
data = {
0: {'id': '2896_926_2', 'name': 'lmn', 'm': 926},
1: {'id': '23_926_3', 'name': 'abc', 'm': 928}
}
for key,val in data.iteritems():
lis.append(val)
d = pd.DataFrame(lis)
print d
Output--
id m name
0 2896_926_2 926 lmn
1 23_926_3 928 abc
And if you want to id as your index then add set_index
for i,j in data.iteritems():
lis.append(j)
d = pd.DataFrame(lis)
d = d.set_index('id')
print d
Output-
m name
id
2896_926_2 926 lmn
23_926_3 928 abc
You can use a loop to convert each dictionary's entries into a list, and then use panda's .from_dict to convert to a dataframe. Here's the example given:
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Use the following approach
import pandas as pd
data = pd.Dataframe(dict)
data = data.drop(0, axis=1)
data = data.drop(1, axis=1)
You can also try this
import pandas as pd
del dict['id']
del dict['name']
del dict['m']
pd.DataFrame(dict)
Try this code!! Still, complexity is O(n)
my_dict.pop('id')
my_dict.pop('name')
my_dict.pop('m')
data = [ row.values() for row in my_dict.values()]
pd.DataFrame(data=data, columns=['id','name','m'])
import pandas as pd
data={'id': '6576_926_1','name': 'xyz','m': 926,0: {'id': '2896_926_2', 'name': 'lmn', 'm': 926},1: {'id': '23_926_3', 'name': 'abc','m': 928}}
Id=[]
Name=[]
M=[]
for k,val in data.items():
if type(val) is dict:
Id.append(val['id'])
Name.append(val['name'])
M.append(val['m'])
df=pd.DataFrame({'Name':Name,'Id':Id,'M':M})
print(df)
mydict = {'id': '6576_926_1',
'name': 'xyz',
'm': 926,
0: {'id': '2896_926_2',
'name': 'lmn',
'm': 926},
1: {'id': '23_926_3',
'name': 'abc',
'm': 928}}
import pandas as pd
del mydict['id']
del mydict['name']
del mydict['m']
d = pd.DataFrame(mydict).T

Categories

Resources