Convert dictionary to python dataframe which has key value pair - python

I have my dictionary as
{'id': '6576_926_1',
'name': 'xyz',
'm': 926,
0: {'id': '2896_926_2',
'name': 'lmn',
'm': 926},
1: {'id': '23_926_3',
'name': 'abc',
'm': 928}}
And I want to convert it into dataframe like
Id Name M
6576_926_1 Xyz 926
2896_926_2 Lmn 926
23_926_3 Abc 928
I am fine even if first row is not available as it doesn't have index. There are around 1.3 MN records and so speed is very important. I tried using a for loop and append statement and it takes forever

As you have mentioned that first row is not mandatory for you. So, here i've tried this. Hope this will solve your problem
import pandas as pd
lis = []
data = {
0: {'id': '2896_926_2', 'name': 'lmn', 'm': 926},
1: {'id': '23_926_3', 'name': 'abc', 'm': 928}
}
for key,val in data.iteritems():
lis.append(val)
d = pd.DataFrame(lis)
print d
Output--
id m name
0 2896_926_2 926 lmn
1 23_926_3 928 abc
And if you want to id as your index then add set_index
for i,j in data.iteritems():
lis.append(j)
d = pd.DataFrame(lis)
d = d.set_index('id')
print d
Output-
m name
id
2896_926_2 926 lmn
23_926_3 928 abc

You can use a loop to convert each dictionary's entries into a list, and then use panda's .from_dict to convert to a dataframe. Here's the example given:
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d

Use the following approach
import pandas as pd
data = pd.Dataframe(dict)
data = data.drop(0, axis=1)
data = data.drop(1, axis=1)
You can also try this
import pandas as pd
del dict['id']
del dict['name']
del dict['m']
pd.DataFrame(dict)

Try this code!! Still, complexity is O(n)
my_dict.pop('id')
my_dict.pop('name')
my_dict.pop('m')
data = [ row.values() for row in my_dict.values()]
pd.DataFrame(data=data, columns=['id','name','m'])

import pandas as pd
data={'id': '6576_926_1','name': 'xyz','m': 926,0: {'id': '2896_926_2', 'name': 'lmn', 'm': 926},1: {'id': '23_926_3', 'name': 'abc','m': 928}}
Id=[]
Name=[]
M=[]
for k,val in data.items():
if type(val) is dict:
Id.append(val['id'])
Name.append(val['name'])
M.append(val['m'])
df=pd.DataFrame({'Name':Name,'Id':Id,'M':M})
print(df)

mydict = {'id': '6576_926_1',
'name': 'xyz',
'm': 926,
0: {'id': '2896_926_2',
'name': 'lmn',
'm': 926},
1: {'id': '23_926_3',
'name': 'abc',
'm': 928}}
import pandas as pd
del mydict['id']
del mydict['name']
del mydict['m']
d = pd.DataFrame(mydict).T

Related

How to get individual fields in list of dictionary

I have a list of dictionaries called dictList that has data like so:
[{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}].
I am trying to create a new dictionary that uses the id as the key and total as the value.
So have tried this:
keys = [d['id'] for d in dictList]
values = [d['total'] for d in dictList]
new_dict[str(keys)]= values
However the output is: {"['5', '5']": [39, 43]}
I am not sure what is going on, I am just trying to get the id and the respective total like 5, 39 and 5, 43 in to new_dict.
EDIT:
Please note that dictList contains all the products with ID 5. There are other fields, but I didn't include them.
One approach:
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = {}
for d in data:
key = d["id"]
if key not in res:
res[key] = 0
res[key] += int(d["total"])
print(res)
Output
{'5': 82}
Alternative using collections.defaultdict:
from collections import defaultdict
data = [{'id': '5', 'total': '39'}, {'id': '5', 'total': '43'}]
res = defaultdict(int)
for d in data:
key = d["id"]
res[key] += int(d["total"])
print(res)
Output
defaultdict(<class 'int'>, {'5': 82})
Use sorted and itertools.groupby to group by the 'id' key of each list element:
import itertools
dictList = [{'id': '5', 'total': '39'}, {'id': '10', 'total': '10'},
{'id': '5', 'total': '43'}, {'id': '10', 'total': '22'}]
groups = itertools.groupby(sorted(dictList, key=lambda item: item['id'])
, key=lambda item: item['id'])
Next, take the sum of each group:
product_totals = {
key: sum(int(item['total']) for item in grp)
for key, grp in groups
}
Which gives:
{'10': 32, '5': 82}
If you have lots of such entries, you could consider using pandas to create a dataframe. Pandas has vectorized methods that help you crunch numbers faster. The idea behind finding the sum of totals is the same, except in this case we don't need to sort because pandas.groupby takes care of that for us
>>> import pandas as pd
>>> df = pd.DataFrame(dictList)
>>> df['total'] = df['total'].astype(int)
>>> df
id total
0 5 39
1 10 10
2 5 43
3 10 22
>>> df.groupby('id').total.sum()
id
10 32
5 82
Name: total, dtype: int32
>>> df.groupby('id').total.sum().as_dict()
{'10': 32, '5': 82}
Although I'm not sure what you are trying to do, try this:
for d in dictlist:
if new_dict[d["id"]]:
new_dict[d["id"]] += d["total"]
else:
new_dict[d["id"]] = d["total"]

Iterate over a column list pandas with a dictionary

I have a list in a pandas dataframe:
0: [car, telephone]
1: [computer, beach, book, language]
2: [rice, bus, street]
Every list is in each row.Also, this list has different length in every row.
and I have a dictionary:
dict = {'car': 'transport',
'rice':'food'
'book':'reading'
}
After that I have flattened the dict
d = {val:key for key, lst in dict.items() for val in lst}
I would like to iterate over the all items in the list and create a column of this kind,
this is the desired output:
index col1 col2
0: [car, telephone],transport
1: [computer, beach, book, language], reading
2: [rice, bus, street], food
I have tried:
df['col2'] = data_df['col1'].index.map(d)
but I get
col2
NaN
NaN
NaN
You can .explode then use the dictionary for translation and then group again:
Sample data:
import pandas as pd
data = {'id': {0: 1, 1: 2, 2: 3}, 'col': {0: ['car', 'telephone'], 1: ['computer', 'beach', 'book', 'language'], 2: ['rice', 'bus', 'street']}}
df = pd.DataFrame(data)
dct = {'car': 'transport', 'rice':'food', 'book':'reading'}
Code:
df2 = df.explode('col')
df2['col2'] = df2['col'].replace(dct)
df['col2'] = df2[~df2['col'].eq(df2['col2'])]['col2']
Output:
id col col2
0 1 [car, telephone] transport
1 2 [computer, beach, book, language] reading
2 3 [rice, bus, street] food
You can use apply on a custom function:
import pandas as pd
df = pd.DataFrame([{'col1': ['car', 'telephone']}, {'col1': ['computer', 'beach', 'book', 'language']}, {'col1': ['rice', 'bus', 'street']}])
def get_col2(lst):
d={'car': 'transport','rice':'food','book':'reading'}
for k,v in d.items():
if k in lst:
return v
df['col2'] = df['col1'].apply(get_col2)
Output:
col1
col2
0
['car', 'telephone']
transport
1
['computer', 'beach', 'book', 'language']
reading
2
['rice', 'bus', 'street']
food

Convert pandas column of type list to dict

Pandas column of length n is of type list.
df['size'][0] = [{'Name': 'Total', 'Value': 50, 'Unit': 'Units'}]
type(df['Size'][0])
list
I'd like to convert the list to a dictionary. i.e type(df['Size'][0]) dict.
{'Name': 'Total',
'Value': 50,
'Unit': 'Units'}
For context, I am trying to parse out the dictionary into multiple columns.
# Unpack Size
for i, row in df.iterrows():
if type(row['Size'][i]) is dict:
dict_obj = row['Size'][i]
for key, val in dict_obj.items():
if key == 'Name':
df.loc[index, 'Size_Name'] = val
if key == 'Value':
df.loc[index, 'Size_Value'] = val
if key == 'Unit':
df.loc[index, 'Size_Unit'] = val
there can be n number of dictionaries.
When you have arbitary number of dictionaries in list use df.explode
df = pd.DataFrame({'size':[[{'a':1},{'b':1}],[{'a':2}],[{'c':2},{'d':2},{'e':4}]]})
df
size
0 [{'a': 1}, {'b': 1}]
1 [{'a': 2}]
2 [{'c': 2}, {'d': 2}, {'e': 4}]
df.explode('size')
size
0 {'a': 1}
0 {'b': 1}
1 {'a': 2}
2 {'c': 2}
2 {'d': 2}
2 {'e': 4}
If it's always list of one dictionary i.e df['size'][x] = [{...}] use itertools.chain.from_iterable
from itertools import chain
df['size'] = list(chain.from_iterable(df['size']))
If you have:
df['size'][0] = [{'Name': 'Total', 'Value': 50, 'Unit': 'Units'}]
type(df['Size'][0])
list
you should use:
type(df['Size'][0][0])
dict
And if you have several dictionaries in the list, increase the last index to get access to the rest of them.

Getting TypeError when trying to retrieve values from keys in a list of dictionaries

I have an array of dictionaries in a pandas DataFrame:
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 88, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 77, 'name': 'Horror'}]
I am trying to get all the names from a single row into a simple list of Strings, like: "Horror, family, drama" etc for each row in the dataset.
I tried this code but I am getting the error: string indices must be integers
for y in df:
names = [x['name'] for x in y]
Any help is appriciated
Iterating over a data-frame iterates over the names of the columns, `:
In [15]: df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
In [16]: df
Out[16]:
a b
0 1 4
1 2 5
2 3 6
In [17]: for x in df:
...: print(x)
...:
a
b
It is like a dict that would iterate over it's keys.
You need something like:
df['your_column'].apply(lambda x: [d['name'] for d in x])
IIUC, this is dict not a list. you should using .get
[[y.get('name') for y in x ]for x in df['your columns']]
Out[578]:
[['Animation', 'Comedy', 'Family'],
['Adventure', 'Fantasy', 'Family'],
['Romance', 'Horror']]
Convert str
import ast
df.a=df.a.apply(ast.literal_eval)

how to write a list of dictionaries into a CSV with multiple values

I have a list of dictionaries in "my_list" as follows:
my_list=[{'Id': '100', 'A': [val1, val2], 'B': [val3, val4], 'C': [val5,val6]},
{'Id': '200', 'A': [val7, val8], 'B': [val9, val10], 'C':
[val11,val12],
{'Id': '300', 'A': [val13, val14], 'B': [val15, val16], 'C':
[val17,val18]}]
I want to write this list into a CSV file as follows:
ID, A, AA, B, BB, C, CC
100, val1, val2, val3, val4, val5, val6
200, val7, val8, val9, val10, val11, val12
300, val13, val14, val15, val16, val17, val18
Does anyone know how can I handle it?
Tablib should do the trick
I leave here the example on their front page (which you can adapt to the .csv format) :
>>> data = tablib.Dataset(headers=['First Name', 'Last Name', 'Age'])
>>> for i in [('Kenneth', 'Reitz', 22), ('Bessie', 'Monke', 21)]:
... data.append(i)
>>> print(data.export('json'))
[{"Last Name": "Reitz", "First Name": "Kenneth", "Age": 22}, {"Last Name": "Monke", "First Name": "Bessie", "Age": 21}]
>>> print(data.export('yaml'))
- {Age: 22, First Name: Kenneth, Last Name: Reitz}
- {Age: 21, First Name: Bessie, Last Name: Monke}
>>> data.export('xlsx')
<censored binary data>
>>> data.export('df')
First Name Last Name Age
0 Kenneth Reitz 22
1 Bessie Monke 21
You could do this... (replacing print with a csv writerow as appropriate)
print(['ID', 'A', 'AA', 'B', 'BB', 'C', 'CC'])
for row in my_list:
out_row = []
out_row.append(row['Id'])
for v in row['A']:
out_row.append(v)
for v in row['B']:
out_row.append(v)
for v in row['C']:
out_row.append(v)
print(out_row)
You can use pandas to do the trick:
my_list = [{'Id': '100', 'A': [val1, val2], 'B': [val3, val4], 'C': [val5, val6]},
{'Id': '200', 'A': [val7, val8], 'B': [val9, val10], 'C': [val11, val12]},
{'Id': '300', 'A': [val13, val14], 'B': [val15, val16], 'C': [val17, val18]}]
index = ['Id', 'A', 'AA', 'B', 'BB', 'C', 'CC']
df = pd.DataFrame(data=my_list)
for letter in ['A', 'B', 'C']:
first = []
second = []
for a in df[letter].values.tolist():
first.append(a[0])
second.append(a[1])
df[letter] = first
df[letter * 2] = second
df = df.reindex_axis(index, axis=1)
df.to_csv('out.csv')
This produces the following output as dataframe:
Id A AA B BB C CC
0 100 1 2 3 4 5 6
1 200 7 8 9 10 11 12
2 300 13 14 15 16 17 18
and this is the out.csv-file:
,Id,A,AA,B,BB,C,CC
0,100,1,2,3,4,5,6
1,200,7,8,9,10,11,12
2,300,13,14,15,16,17,18
See pandas documentation about the csv-feature (csv).
Write DataFrame to a comma-separated values (csv) file

Categories

Resources