Row wise operation in Pandas DataFrame - python

I have a Dataframe as
import pandas as pd
df = pd.DataFrame({
"First": ['First1', 'First2', 'First3'],
"Secnd": ['Secnd1', 'Secnd2', 'Secnd3']
)
df.index = ['Row1', 'Row2', 'Row3']
I would like to have a lambda function in apply method to create a list of dictionary (including index item) as below
[
{'Row1': ['First1', 'Secnd1']},
{'Row2': ['First2', 'Secnd2']},
{'Row3': ['First3', 'Secnd3']},
]
If I use something like .apply(lambda x: <some operation>) here, x does not include the index rather the values.
Cheers,
DD

To expand Hans Bambel's answer to get the exact desired output:
[{k: list(v.values())} for k, v in df.to_dict('index').items()]

You don't need apply here. You can just use the to_dict() function with the "index" argument:
df.to_dict("index")
This gives the output:
{'Row1': {'First': 'First1', 'Secnd': 'Secnd1'},
'Row2': {'First': 'First2', 'Secnd': 'Secnd2'},
'Row3': {'First': 'First3', 'Secnd': 'Secnd3'}}

Related

Pandas Dataframe from list nested in json

I have a request that gets me some data that looks like this:
[{'__rowType': 'META',
'__type': 'units',
'data': [{'name': 'units.unit', 'type': 'STRING'},
{'name': 'units.classification', 'type': 'STRING'}]},
{'__rowType': 'DATA', '__type': 'units', 'data': ['A', 'Energie']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['bar', ' ']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['CCM', 'Volumen']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['CDM', 'Volumen']}]
and would like to construct a (Pandas) DataFrame that looks like this:
Things like pd.DataFrame(pd.json_normalize(test)['data'] are close but still throw the whole list into the column instead of making separate columns. record_path sounded right but I can't get it to work correctly either.
Any help?
It's difficult to know how the example generalizes, but for this particular case you could use:
pd.DataFrame([d['data'] for d in test
if d.get('__rowType', None)=='DATA' and 'data' in d],
columns=['unit', 'classification']
)
NB. assuming test the input list
output:
unit classification
0 A Energie
1 bar
2 CCM Volumen
3 CDM Volumen
Instead of just giving you the code, first I explain how you can do this by details and then I'll show you the exact steps to follow and the final code. This way you understand everything for any further situation.
When you want to create a pandas dataframe with two columns you can do this by creating a dictionary and passing it to DataFrame class:
my_data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=my_data)
This will result in this dataframe:
So if you want to have the dataframe you specified in your question the my_data dictionary should be like this:
my_data = {
'unit': ['A', 'bar', 'CCM', 'CDM'],
'classification': ['Energie', '', 'Volumen', 'Volumen'],
}
df = pd.DataFrame(data=my_data, )
df.index = np.arange(1, len(df)+1)
df
(You can see the df.index=... part. This is because that the index column of the desired dataframe is started at 1 in your question)
So if you want to do so you just have to extract these data from the data you provided and convert them to the exact dictionary mentioned above (my_data dictionary)
To do so you can do this:
# This will get the data values like 'bar', 'CCM' and etc from your initial data
values = [x['data'] for x in d if x['__rowType']=='DATA']
# This gets the columns names from meta data
meta = list(filter(lambda x: x['__rowType']=='META', d))[0]
columns = [x['name'].split('.')[-1] for x in meta['data']]
# This line creates the exact dictionary we need to send to DataFrame class.
my_data = {column:[v[i] for v in values] for i, column in enumerate(columns)}
So the whole code would be this:
d = YOUR_DATA
# This will get the data values like 'bar', 'CCM' and etc
values = [x['data'] for x in d if x['__rowType']=='DATA']
# This gets the columns names from meta data
meta = list(filter(lambda x: x['__rowType']=='META', d))[0]
columns = [x['name'].split('.')[-1] for x in meta['data']]
# This line creates the exact dictionary we need to send to DataFrame class.
my_data = {column:[v[i] for v in values] for i, column in enumerate(columns)}
df = pd.DataFrame(data=my_data, )
df.index = np.arange(1, len(df)+1)
df #or print(df)
Note: Of course you can do all of this in one complex line of code but to avoid confusion I decided to do this in couple of lines of code

Pandas: Convert dictionary to dataframe where keys and values are the columns

I have a dictionary like so:
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI', '7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI', '9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
I would like to convert my dictionary into something like this to make a dataframe where I put all the keys and values in a separate list.
d = {'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e', '7c975c26-f9fc-4579-822d-a1042b82cb17', '9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI', 'SUCCEEDED-AEN-IC_GBI', 'SUCCEEDED-ESP2-IC_GBI']
What would be the best way to go about this?
You can easily create a DataFrame like this:
import pandas as pd
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI',
'7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI',
'9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
table = pd.DataFrame(d.items(), columns=['key', 'value'])
If you just want to rearrange your Dictionary you could do this:
d2 = {'key': list(d.keys()), 'value': list(d.values())}
Since you tagged pandas, try:
pd.Series(d).reset_index(name='value').to_dict('list')
Output:
{'index': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
Pure python:
{'key':list(d.keys()), 'value': list(d.values())}
output:
{'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
You can create the dataframe zipping the key/value lists with zip function:
import pandas as pd
df = pd.DataFrame(list(zip(d.keys(),d.values())), columns=['key','value'])

Most efficient way to place a Pandas data frame into a list of dictionaries with a certain format

I have a Pandas data frame that contains one column and an index of timestamps. The code for the data frame looks something like this:
import pandas as pd
indx = pd.date_range(start = '12-12-2020 06:00:00',end = '12-12-2020 06:02:00',freq = 'T')
df = pd.DataFrame(data = [0.2,0.4,0.6],index = indx,columns = ['colname'])
I want to create a list of dictionaries from the rows of df in a certain way. For each row of the data frame, I want to create a dictionary with the keys "Timestamp" and "Value". The value of the "Timestamp" key will be the index of that row. The value of the "Value" key will be the value of the row in the data frame columns. Each of these dictionaries will be appended to a list.
I know I can do this by looping over all of the rows of the data frame like this:
dict_list = []
for i in range(df.shape[0]):
new_dict = {'Timestamp': df.index[i],'Value': df.iloc[i,0]}
dict_list.append(new_dict)
However, the data frames I'm actually working with may be very large. Is there a faster, more efficient way of doing this other than using a for loop?
You need to rename your column and give your Index a name and turn it into a column. Then you want DataFrame.to_dict using the 'records' ('r') orientation.
df = df.rename(columns={'colname': 'Value'}).rename_axis(index='Timestamp').reset_index()
dict_list = df.to_dict('records')
#[{'Timestamp': Timestamp('2020-12-12 06:00:00'), 'Value': 0.2},
# {'Timestamp': Timestamp('2020-12-12 06:01:00'), 'Value': 0.4},
# {'Timestamp': Timestamp('2020-12-12 06:02:00'), 'Value': 0.6}]
For larger DataFrames it gets a bit faster than simple looping, but it still gets slow as things are large
import perfplot
import pandas as pd
import numpy as np
def loop(df):
dict_list = []
for i in range(df.shape[0]):
new_dict = {'Timestamp': df.index[i],'Value': df.iloc[i,0]}
dict_list.append(new_dict)
return dict_list
def df_to_dict(df):
df = df.rename(columns={'colname': 'Value'}).rename_axis(index='Timestamp').reset_index()
return df.to_dict('records')
perfplot.show(
setup=lambda n: pd.DataFrame({'colname': np.random.normal(0,1,n)},
index=pd.date_range('12-12-2020', freq = 'T', periods=n)),
kernels=[
lambda df: loop(df),
lambda df: df_to_dict(df),
],
labels=['Loop', 'df.to_dict'],
n_range=[2 ** k for k in range(20)],
equality_check=None,
xlabel='len(df)'
)

How to convert a multi-indexed dataframe, a dataframe grouped by multi columns to nested json

My Pandas Series, which I got from applying groupby operation on DataFrame with columns 'var' and 'month' and applying sum on the corresponding data looks like this ('var' and 'month' are indexes below) :
var month
X Feb -0.061575
Jan 1.366478
Y Feb -1.310896
Z Apr 0.053076
Feb 1.292415
Mar 0.375144
P Feb 1.241288
Mar 0.613453
What I want a format of JSON created from the above DataFrame like below:
'data':[{'label': 'X', 'data': ['Jan': 1.366478, 'Feb': -0.061575]}, ... ]
I know the basic pandas .to_json() may not work here. Probably a combination of list comprehension, lambda function etc. can work here?
The closest I could think of is :
dict = {k: df[k].to_dict() for k in df.index.levels[0]}
This produce {'X': {'Feb': -0.06157474257929787, 'Jan': 1.366478487212244},'Y': ...}
Any help is appreciated.
Thanks
'data':[{'label': 'X', 'data': ['Jan': 1.366478, 'Feb': -0.061575]}, ... ]
This is an invalid json. The inner list doesn't make sense
For me the solution I found is below piece of code (Assuming group_data holds the already grouped by data on a DataFrame).
group_dict = {k: group_data[k].to_dict() for k in group_data.index.levels[0]}
group_list = []
for k, v in group_dict.items():
dict = {'label': k, 'data': v}
group_list.append(dict)

Map two dataframes and perform sum operation using a dictionary

I have a dataframe df
df
Object Action Cost1 Cost2
0 123 renovate 10000 2000
1 456 do something 0 10
2 789 review 1000 50
and a dictionary (called dictionary)
dictionary
{'Object_new': ['Object'],
'Action_new': ['Action'],
'Total_Cost': ['Cost1', 'Cost2']}
Further, I have a (at the beginning empty) dataframe df_new that should contain almost the identicall information as df, except that the column names need to be different (naming according to the dictionary) and that some columns from df should be consolidated (e.g. a sum-operation) based on the dictionary.
The result should look like this:
df_new
Object_new Action_new Total_Cost
0 123 renovate 12000
1 456 do something 10
2 789 review 1050
How can I achieve this result using only the dictionary? I tried to use the .map() function but could not figure out how to perform the sum-operation with it.
The code to reproduce both dataframes and the dictionary are attached:
# import libraries
import pandas as pd
### create df
data_df = {'Object': [123, 456, 789],
'Action': ['renovate', 'do something', 'review'],
'Cost1': [10000, 0, 1000],
'Cost2': [2000, 10, 50],
}
df = pd.DataFrame(data_df)
### create dictionary
dictionary = {'Object_new':['Object'],
'Action_new':['Action'],
'Total_Cost' : ['Cost1', 'Cost2']}
### create df_new
# data_df_new = pd.DataFrame(columns=['Object_new', 'Action_new', 'Total_Cost' ])
data_df_new = {'Object_new': [123, 456, 789],
'Action_new': ['renovate', 'do something', 'review'],
'Total_Cost': [12000, 10, 1050],
}
df_new = pd.DataFrame(data_df_new)
A play with groupby:
inv_dict = {x:k for k,v in dictionary.items() for x in v}
df_new = df.groupby(df.columns.map(inv_dict),
axis=1).sum()
Output:
Action_new Object_new Total_Cost
0 renovate 123 12000
1 do something 456 10
2 review 789 1050
Given the complexity of your algorithm, I would suggest performing a Series addition operation to solve this problem.
Why? In Pandas, every column in a DataFrame works as a Series under the hood.
data_df_new = {
'Object_new': df['Object'],
'Action_new': df['Action'],
'Total_Cost': (df['Cost1'] + df['Cost2']) # Addition of two series
}
df_new = pd.DataFrame(data_df_new)
Running this code will map every value contained in your dataset, which will be stored in our dictionary.
You can use an empty data frame to copy the new column and use the to_dict to convert it to a dictionary.
import pandas as pd
import numpy as np
data_df = {'Object': [123, 456, 789],
'Action': ['renovate', 'do something', 'review'],
'Cost1': [10000, 0, 1000],
'Cost2': [2000, 10, 50],
}
df = pd.DataFrame(data_df)
print(df)
MyEmptydf = pd.DataFrame()
MyEmptydf['Object_new']=df['Object']
MyEmptydf['Action_new']=df['Action']
MyEmptydf['Total_Cost'] = df['Cost1'] + df['Cost2']
print(MyEmptydf)
dictionary = MyEmptydf.to_dict(orient="index")
print(dictionary)
you can run the code here:https://repl.it/repls/RealisticVillainousGlueware
If you trying to entirely avoid pandas and only use the dictionary this should solve it
Object = []
totalcost = []
action = []
for i in range(0,3):
Object.append(data_df['Object'][i])
totalcost.append(data_df['Cost1'][i]+data_df['Cost2'][i])
action.append(data_df['Action'][i])
dict2 = {'Object':Object, 'Action':action, 'TotalCost':totalcost}

Categories

Resources