I have a dictionary like so:
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI', '7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI', '9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
I would like to convert my dictionary into something like this to make a dataframe where I put all the keys and values in a separate list.
d = {'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e', '7c975c26-f9fc-4579-822d-a1042b82cb17', '9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI', 'SUCCEEDED-AEN-IC_GBI', 'SUCCEEDED-ESP2-IC_GBI']
What would be the best way to go about this?
You can easily create a DataFrame like this:
import pandas as pd
d = {'3a0fe308-b78d-4080-a68b-84fdcbf5411e': 'SUCCEEDED-HALL-IC_GBI',
'7c975c26-f9fc-4579-822d-a1042b82cb17': 'SUCCEEDED-AEN-IC_GBI',
'9ff20206-a841-4dbf-a736-a35fcec604f3': 'SUCCEEDED-ESP2-IC_GBI'}
table = pd.DataFrame(d.items(), columns=['key', 'value'])
If you just want to rearrange your Dictionary you could do this:
d2 = {'key': list(d.keys()), 'value': list(d.values())}
Since you tagged pandas, try:
pd.Series(d).reset_index(name='value').to_dict('list')
Output:
{'index': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
Pure python:
{'key':list(d.keys()), 'value': list(d.values())}
output:
{'key': ['3a0fe308-b78d-4080-a68b-84fdcbf5411e',
'7c975c26-f9fc-4579-822d-a1042b82cb17',
'9ff20206-a841-4dbf-a736-a35fcec604f3'],
'value': ['SUCCEEDED-HALL-IC_GBI',
'SUCCEEDED-AEN-IC_GBI',
'SUCCEEDED-ESP2-IC_GBI']}
You can create the dataframe zipping the key/value lists with zip function:
import pandas as pd
df = pd.DataFrame(list(zip(d.keys(),d.values())), columns=['key','value'])
Related
I would like to take my pandas Dataframe and convert it to a list of dictionaries. I can do this using the pandas to_dict('records') function. However, this function takes any column values that are lists and returns numpy arrays. I would like for the content of the returned list of dictionaries to be base python objects rather than numpy arrays.
I understand I could iterate my outputted dictionaries but I was wondering if there is something more clever to do this.
Here is some sample code that shows my problem:
import pandas as pd
import numpy as np
data = pd.concat([
pd.Series(['a--b', 'c--d', 'e--f'], name='key'),
pd.Series(['123', '456', '789'], name='code'),
pd.Series([np.array(['123', '098']), np.array(['000', '999']), np.array(['789', '432'])], name='codes')
], axis=1)
output = data.to_dict('records')
# this prints <class 'numpy.ndarray'>
print(type(output[0]['codes']))
output, in this case, looks like this:
[{'key': 'a--b', 'code': '123', 'codes': array(['123', '098'], dtype='<U3')},
{'key': 'c--d', 'code': '456', 'codes': array(['000', '999'], dtype='<U3')},
{'key': 'e--f', 'code': '789', 'codes': array(['789', '432'], dtype='<U3')}]
I would like for that print statement to print a list. I understand I could simply do the following:
for row in output:
row['codes'] = row['codes'].tolist()
# this now prints <class 'list'>, which is what I want
print(type(output[0]['codes']))
However, my dataframe is of course much more complicated than this, and I have multiple columns that are numpy arrays. I know I could expand the snippet above to check which columns are array type and cast them using tolist(), but I'm wondering if there is something snappier or more clever? Perhaps something provided by Pandas that is optimized?
To be clear, here is the output I need to have:
print(output)
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
{'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
{'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]
Let us first use applymap to convert numpy array's to python lists, then use to_dict
cols = ['codes']
data.assign(**data[cols].applymap(list)).to_dict('records')
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
{'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
{'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]
I ended up creating a list of the numpy-typed column names:
np_fields = ['codes']
and then I replaced each field in place in my dataframe:
for col in np_fields:
data[col] = data[col].map(np.ndarray.tolist)
I then called data.to_dict('records') once that was complete.
I have a request that gets me some data that looks like this:
[{'__rowType': 'META',
'__type': 'units',
'data': [{'name': 'units.unit', 'type': 'STRING'},
{'name': 'units.classification', 'type': 'STRING'}]},
{'__rowType': 'DATA', '__type': 'units', 'data': ['A', 'Energie']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['bar', ' ']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['CCM', 'Volumen']},
{'__rowType': 'DATA', '__type': 'units', 'data': ['CDM', 'Volumen']}]
and would like to construct a (Pandas) DataFrame that looks like this:
Things like pd.DataFrame(pd.json_normalize(test)['data'] are close but still throw the whole list into the column instead of making separate columns. record_path sounded right but I can't get it to work correctly either.
Any help?
It's difficult to know how the example generalizes, but for this particular case you could use:
pd.DataFrame([d['data'] for d in test
if d.get('__rowType', None)=='DATA' and 'data' in d],
columns=['unit', 'classification']
)
NB. assuming test the input list
output:
unit classification
0 A Energie
1 bar
2 CCM Volumen
3 CDM Volumen
Instead of just giving you the code, first I explain how you can do this by details and then I'll show you the exact steps to follow and the final code. This way you understand everything for any further situation.
When you want to create a pandas dataframe with two columns you can do this by creating a dictionary and passing it to DataFrame class:
my_data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=my_data)
This will result in this dataframe:
So if you want to have the dataframe you specified in your question the my_data dictionary should be like this:
my_data = {
'unit': ['A', 'bar', 'CCM', 'CDM'],
'classification': ['Energie', '', 'Volumen', 'Volumen'],
}
df = pd.DataFrame(data=my_data, )
df.index = np.arange(1, len(df)+1)
df
(You can see the df.index=... part. This is because that the index column of the desired dataframe is started at 1 in your question)
So if you want to do so you just have to extract these data from the data you provided and convert them to the exact dictionary mentioned above (my_data dictionary)
To do so you can do this:
# This will get the data values like 'bar', 'CCM' and etc from your initial data
values = [x['data'] for x in d if x['__rowType']=='DATA']
# This gets the columns names from meta data
meta = list(filter(lambda x: x['__rowType']=='META', d))[0]
columns = [x['name'].split('.')[-1] for x in meta['data']]
# This line creates the exact dictionary we need to send to DataFrame class.
my_data = {column:[v[i] for v in values] for i, column in enumerate(columns)}
So the whole code would be this:
d = YOUR_DATA
# This will get the data values like 'bar', 'CCM' and etc
values = [x['data'] for x in d if x['__rowType']=='DATA']
# This gets the columns names from meta data
meta = list(filter(lambda x: x['__rowType']=='META', d))[0]
columns = [x['name'].split('.')[-1] for x in meta['data']]
# This line creates the exact dictionary we need to send to DataFrame class.
my_data = {column:[v[i] for v in values] for i, column in enumerate(columns)}
df = pd.DataFrame(data=my_data, )
df.index = np.arange(1, len(df)+1)
df #or print(df)
Note: Of course you can do all of this in one complex line of code but to avoid confusion I decided to do this in couple of lines of code
I have a Dataframe as
import pandas as pd
df = pd.DataFrame({
"First": ['First1', 'First2', 'First3'],
"Secnd": ['Secnd1', 'Secnd2', 'Secnd3']
)
df.index = ['Row1', 'Row2', 'Row3']
I would like to have a lambda function in apply method to create a list of dictionary (including index item) as below
[
{'Row1': ['First1', 'Secnd1']},
{'Row2': ['First2', 'Secnd2']},
{'Row3': ['First3', 'Secnd3']},
]
If I use something like .apply(lambda x: <some operation>) here, x does not include the index rather the values.
Cheers,
DD
To expand Hans Bambel's answer to get the exact desired output:
[{k: list(v.values())} for k, v in df.to_dict('index').items()]
You don't need apply here. You can just use the to_dict() function with the "index" argument:
df.to_dict("index")
This gives the output:
{'Row1': {'First': 'First1', 'Secnd': 'Secnd1'},
'Row2': {'First': 'First2', 'Secnd': 'Secnd2'},
'Row3': {'First': 'First3', 'Secnd': 'Secnd3'}}
I have a Pandas data frame that contains one column and an index of timestamps. The code for the data frame looks something like this:
import pandas as pd
indx = pd.date_range(start = '12-12-2020 06:00:00',end = '12-12-2020 06:02:00',freq = 'T')
df = pd.DataFrame(data = [0.2,0.4,0.6],index = indx,columns = ['colname'])
I want to create a list of dictionaries from the rows of df in a certain way. For each row of the data frame, I want to create a dictionary with the keys "Timestamp" and "Value". The value of the "Timestamp" key will be the index of that row. The value of the "Value" key will be the value of the row in the data frame columns. Each of these dictionaries will be appended to a list.
I know I can do this by looping over all of the rows of the data frame like this:
dict_list = []
for i in range(df.shape[0]):
new_dict = {'Timestamp': df.index[i],'Value': df.iloc[i,0]}
dict_list.append(new_dict)
However, the data frames I'm actually working with may be very large. Is there a faster, more efficient way of doing this other than using a for loop?
You need to rename your column and give your Index a name and turn it into a column. Then you want DataFrame.to_dict using the 'records' ('r') orientation.
df = df.rename(columns={'colname': 'Value'}).rename_axis(index='Timestamp').reset_index()
dict_list = df.to_dict('records')
#[{'Timestamp': Timestamp('2020-12-12 06:00:00'), 'Value': 0.2},
# {'Timestamp': Timestamp('2020-12-12 06:01:00'), 'Value': 0.4},
# {'Timestamp': Timestamp('2020-12-12 06:02:00'), 'Value': 0.6}]
For larger DataFrames it gets a bit faster than simple looping, but it still gets slow as things are large
import perfplot
import pandas as pd
import numpy as np
def loop(df):
dict_list = []
for i in range(df.shape[0]):
new_dict = {'Timestamp': df.index[i],'Value': df.iloc[i,0]}
dict_list.append(new_dict)
return dict_list
def df_to_dict(df):
df = df.rename(columns={'colname': 'Value'}).rename_axis(index='Timestamp').reset_index()
return df.to_dict('records')
perfplot.show(
setup=lambda n: pd.DataFrame({'colname': np.random.normal(0,1,n)},
index=pd.date_range('12-12-2020', freq = 'T', periods=n)),
kernels=[
lambda df: loop(df),
lambda df: df_to_dict(df),
],
labels=['Loop', 'df.to_dict'],
n_range=[2 ** k for k in range(20)],
equality_check=None,
xlabel='len(df)'
)
I am doing a research project and trying to pull thousands of quarterly results for companies from the SEC EDGAR API.
Each result is a list of dictionaries structured as follows:
[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}...]
I want each result to be a row of a pandas dataframe. The issue is that each result may not have the same fields due to the data available. I would like to check if the column(field) of the dataframe is present in one of the results field and if it is add the result value to the row. If not, I would like to add an np.NaN. How would I go about doing this?
A list/dict comprehension ought to work here:
In [11]: s
Out[11]:
[[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}],
[{'field': 'othercurrentliabilities', 'value': 6886000000.0}]]
In [12]: pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s])
Out[12]:
othercurrentliabilities otherliabilities propertyplantequipmentnet
0 6.886000e+09 1.370000e+10 1.578900e+10
1 6.886000e+09 NaN NaN
make a list of df.result.rows[x]['values']
like below
s=[]
for x in range(df.result.totalrows[0]):
s=s+[df.result.rows[x]['values']]
print(x)
df1=pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s]
df1
will give you result.