My dataframe is as shown
name key value
john A223 390309
jason B439 230943
peter A5388 572039
john D23902 238939
jason F2390 23930
I want to convert the above generated dataframe into a dictionary in the below shown format.
{'john': {'key':'A223', 'value':'390309', 'key':'A5388', 'value':'572039'},
'jason': {'key':'B439','value':'230943', 'key':'F2390', 'value':'23930'},
'peter': {'key':'A5388' ,'value':'572039'}}
I tried a = dict(zip(dataframe['key'],dataframe['value'])).
But wont give me the dataframe columns headers.
Dictionary keys must be unique
Assuming, as in your desired output, you want to keep only rows with the first instance of each name, you can reverse row order and then use to_dict with orient='index':
res = df.iloc[::-1].set_index('name').to_dict('index')
print(res)
{'jason': {'key': 'B439', 'value': 230943},
'john': {'key': 'A223', 'value': 390309},
'peter': {'key': 'A5388', 'value': 572039}}
Related
I have the following dataframe that I extracted from an API, inside that df there is a column that I need to extract data from it, but the structure of that data inside that column is a list of dictionaries:
I could get the data that I care from that dictionary using this chunk of code:
for k,v in d.items():
for i,j in v.items():
if isinstance(j, list):
for l in range(len(j)):
for k in j[l]:
print(j[l])
I get a structure like this one, so I´d need to get each of that 'values' inside the list of dictionaries
and then organize them in a dataframe. like for example the first item on the list of dictionaries:
Once I get to the point of getting the above structure, how could I make a dataframe like the one in the image?
Raw data:
data = {'rows': [{'values': ['Tesla Inc (TSLA)', '$1056.78', '$1199.78', '13.53%'], 'children': []}, {'values': ['Taiwan Semiconductor Manufacturing Company Limited (TSM)', '$120.31', '$128.80', '7.06%'], 'children': []}]}
You can use pandas. First cast your data to pd.DataFrame, then use apply(pd.Series) to expand lists inside 'values' column to separate columns and set_axis method to change column names:
import pandas as pd
data = {'rows': [{'values': ['Tesla Inc (TSLA)', '$1056.78', '$1199.78', '13.53%'], 'children': []}, {'values': ['Taiwan Semiconductor Manufacturing Company Limited (TSM)', '$120.31', '$128.80', '7.06%'], 'children': []}]}
out = pd.DataFrame(data['rows'])['values'].apply(pd.Series).set_axis(['name','price','price_n','pct'], axis=1)
Output:
name price price_n pct
0 Tesla Inc (TSLA) $1056.78 $1199.78 13.53%
1 Taiwan Semiconductor Manufacturing Company Lim... $120.31 $128.80 7.06%
I have a dictionary like so:
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI', '7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI', '9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
I would like to convert my dictionary into something like this to make a dataframe where I put all the keys and values in a separate list.
d = {'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e', '7c975c26-f9fc-4579-822d-a1042b82cb17', '9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI', 'SUCCEEDED-AEN-IC_GBI', 'SUCCEEDED-ESP2-IC_GBI']
What would be the best way to go about this?
You can easily create a DataFrame like this:
import pandas as pd
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI',
'7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI',
'9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
table = pd.DataFrame(d.items(), columns=['key', 'value'])
If you just want to rearrange your Dictionary you could do this:
d2 = {'key': list(d.keys()), 'value': list(d.values())}
Since you tagged pandas, try:
pd.Series(d).reset_index(name='value').to_dict('list')
Output:
{'index': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
Pure python:
{'key':list(d.keys()), 'value': list(d.values())}
output:
{'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
You can create the dataframe zipping the key/value lists with zip function:
import pandas as pd
df = pd.DataFrame(list(zip(d.keys(),d.values())), columns=['key','value'])
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 2 years ago.
I am beginner of programming language, so it would be appreciated you help and support.
Here is DataFrame and one column' data is JSON type? of data.
ID, Name, Information
1234, xxxx, '{'age': 25, 'gender': 'male'}'
2234, yyyy, '{'age': 34, 'gender': 'female'}'
3234, zzzz, '{'age': 55, 'gender': 'male'}'
I would like to covert this DataFrame as below.
ID, Name, age, gender
1234, xxxx, 25, male
2234, yyyy, 34, female
3234, zzzz, 55, male
I found that ast.literal_eval() can convert str to dict type, but I have no idea how to write code of this issue.
Would you please give some example of a code which can solve this issue?
Given test.csv
ID,Name,Information
1234,xxxx,"{'age': 25, 'gender': 'male'}"
2234,yyyy,"{'age': 34, 'gender': 'female'}"
3234,zzzz,"{'age': 55, 'gender': 'male'}"
Read the file in with pd.read_csv and use the converters parameter with ast.literal_eval, which will convert the data in the Information column from a str type to dict type.
Use pd.json_normalize to unpack the dict with keys as column headers and values in the rows
.join the normalized columns with df
.drop the Information column
import pandas as pd
from ast import literal_eval
df = pd.read_csv('test.csv', converters={'Information': literal_eval})
df = df.join(pd.json_normalize(df.Information))
df.drop(columns=['Information'], inplace=True)
# display(df)
ID Name age gender
0 1234 xxxx 25 male
1 2234 yyyy 34 female
2 3234 zzzz 55 male
If the data is not from a csv file
import pandas as pd
from ast import literal_eval
data = {'ID': [1234, 2234, 3234],
'Name': ['xxxx', 'yyyy', 'zzzz'],
'Information': ["{'age': 25, 'gender': 'male'}", "{'age': 34, 'gender': 'female'}", "{'age': 55, 'gender': 'male'}"]}
df = pd.DataFrame(data)
# apply literal_eval to Information
df.Information = df.Information.apply(literal_eval)
# normalize the Information column and join to df
df = df.join(pd.json_normalize(df.Information))
# drop the Information column
df.drop(columns=['Information'], inplace=True)
If third column is a JSON string, ' is not valid, it should be ", so we need to fix this.
If the third column is a string representation of python dict, you can use eval to convert it.
A sample of code to split third column of type dict and merge into the original DataFrame:
data = [
[1234, 'xxxx', "{'age': 25, 'gender': 'male'}"],
[2234, 'yyyy', "{'age': 34, 'gender': 'female'}"],
[3234, 'zzzz', "{'age': 55, 'gender': 'male'}"],
]
df = pd.DataFrame().from_dict(data)
df[2] = df[2].apply(lambda x: json.loads(x.replace("'", '"'))) # fix the data and convert to dict
merged = pd.concat([df[[0, 1]], df[2].apply(pd.Series)], axis=1)
I have the following pandas dataframe which looks like,
code comp name
0 A292340 디비자산운용 마이티 200커버드콜ATM레버리지
1 A291630 키움투자자산운용 KOSEF 코스닥150선물레버리지
2 A278240 케이비자산운용 KBSTAR 코스닥150선물레버리지
3 A267770 미래에셋자산운용 TIGER 200선물레버리지
4 A267490 케이비자산운용 KBSTAR 미국장기국채선물레버리지(합성 H)
And I like to make dictionary out of this which will look like,
{'20180408' :{'A292340' : {comp : 디비자산운용}, {name : 마이티 200커버드콜ATM 레버리지}}}
Sorry about the data which is in foreign to you, but let me please ask.
What I tried is like,
values = [comp, name]
names = ['comp', 'name']
tmp = {names:values for names, values in zip(names, values)}
tpm = {code:values for values in zip(*tmp)}
aaaa = {date:c for c in zip(*tpm)}
print(aaaa)
aaaa is what I try to get.. and date is just simple list of date, from prior to todate. but when I run this, I got the error
TypeError: unhashable type: 'numpy.ndarray'
Thank you in advance for your answer.
Is this what you want? First, set "code" column as the index. Then use to_dict with "orient="index".
df.set_index("code").to_dict("index")
{'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'},
'A267770': {'comp': '미래에셋자산운용', 'name': 'TIGER 200선물레버리지'},
'A278240': {'comp': '케이비자산운용', 'name': 'KBSTAR 코스닥150선물레버리지'},
'A291630': {'comp': '키움투자자산운용', 'name': 'KOSEF 코스닥150선물레버리지'},
'A292340': {'comp': '디비자산운용', 'name': '마이티 200커버드콜ATM레버리지'}}
Using the argument "index" will give the layout:
{index -> {columnName -> valueOfTheColumn}}
Here since we set code as the index, we have
code -> {"comp" -> comp's value, "name" -> name's value}
'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'}
I am doing a research project and trying to pull thousands of quarterly results for companies from the SEC EDGAR API.
Each result is a list of dictionaries structured as follows:
[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}...]
I want each result to be a row of a pandas dataframe. The issue is that each result may not have the same fields due to the data available. I would like to check if the column(field) of the dataframe is present in one of the results field and if it is add the result value to the row. If not, I would like to add an np.NaN. How would I go about doing this?
A list/dict comprehension ought to work here:
In [11]: s
Out[11]:
[[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}],
[{'field': 'othercurrentliabilities', 'value': 6886000000.0}]]
In [12]: pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s])
Out[12]:
othercurrentliabilities otherliabilities propertyplantequipmentnet
0 6.886000e+09 1.370000e+10 1.578900e+10
1 6.886000e+09 NaN NaN
make a list of df.result.rows[x]['values']
like below
s=[]
for x in range(df.result.totalrows[0]):
s=s+[df.result.rows[x]['values']]
print(x)
df1=pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s]
df1
will give you result.