This question already has answers here:
Create pandas dataframe from nested dict with outer keys as df index and inner keys column headers
(2 answers)
Closed 5 months ago.
I have a dictionary with the following keys and values
my_dict={'1':{'name':'one',
'f_10':[1,10,20,30],
'f_20':[1,20,40,60]},
'2':{'name':'two',
'f_10':[2,12,22,32],
'f_20':[2,22,42,62]}}
How do I convert it to a Pandas DataFrame that will look like:
name name f_10 f_20
1 one [1,10,20,30] [1,10,20,60]
2 two [2,12,22,32] [2,22,42,62]
The lists need to be considered in one column based on the key, if I try to concat these get converted to separate rows in the data frame.
Simply use orient=index when importing your data using from_dict:
df = pd.DataFrame.from_dict(my_dict, orient = 'index')
This returns:
name f_10 f_20
1 one [1, 10, 20, 30] [1, 20, 40, 60]
2 two [2, 12, 22, 32] [2, 22, 42, 62]
I would parse that dictionary to DataFrame and have it transposed. For example,
pd.DataFrame(my_dict).T
Result
name f_10 f_20
1 one [1, 10, 20, 30] [1, 20, 40, 60]
2 two [2, 12, 22, 32] [2, 22, 42, 62]
Related
start_list = [1, 10, 20, 30]
end_list = [10, 20, 30, 40]
The data has columns including 'Measure', which ranges from 0 to 100. How can you create groups by using the start and end ranges (inclusive)?
You must use from cut method
df['category']=pd.cut(df, start_list)
for more customization, read below
https://pandas.pydata.org/docs/reference/api/pandas.cut.html
I am currently stuck trying to extract a value from a list/array depending on values of a dataframe.
Imagine i have this array. This array i can manually create so i can put the numbers in any way i want i just thought this python list was the best one but i can do anything here
value = [[30, 120, 600, 3000], [15, 60, 300, 1500], [30, 120, 600, 3000], [10, 40, 200, 1000],[10, 40, 200, 1000], [10, 40, 200, 1000], [10, 40, 200, 1000], [5, 20, 100, 500]]
I have also a data frame that comes from much bigger/dynamic processing where I have two columns, which are int types. Here a code to recreate those 2 columns as an example.
The array possible values of id1 go from 0 to 6 and of id2 go from 0 to 3
data = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1]}
df = pd.DataFrame(data)
What i want to do is add an additional column in the dataframe df which is based on the value of the array depending on the two columns.
So for example the first row of data frame will take the value of value[4][1]=40 to end up with a dataframe like this
result = {'id1': [4, 2, 6, 6], 'id2': [1, 2, 3, 1], 'matched value': [40, 600, 1000, 40]}
dfresult = pd.DataFrame(result)
I am a bit lost on what is the best way to achieve this.
What comes to my mind is a very brutal solution where what i can do is take the values of the multidimensional array and just create a single list where I have all the possible 7*4 combinations, in the data frame create a new column which is the concatenation of the two-ids and then do a straight join based on the simple condition. This would likely work in this case because the possible combinations are few but I am certain there is a learning opportunity here to use lists in a dynamic way that escapes me!
You can use list comprehension to iterate over the id pairs and retrieve the corresponding value for each pair
df['matched_val'] = [value[i][j] for i, j in zip(df['id1'], df['id2'])]
Or a better solution with numpy indexing but applicable only if the sub-lists inside value are of equal length:
df['matched_val'] = np.array(value)[df['id1'], df['id2']]
Result
id1 id2 matched_val
0 4 1 40
1 2 2 600
2 6 3 1000
3 6 1 40
I want to convert 3 rows as multi level column header in pandas dataframe.
Sample dataframe is,
df = pd.DataFrame({'a':['foo_0', 'bar_0', 1, 2, 3], 'b':['foo_0', 'bar_0', 11, 12, 13],
'c':['foo_1', 'bar_1', 21, 22, 23], 'd':['foo_1', 'bar_1', 31, 32, 33]})
expected output looks like, wherein yellow colored is a column multi level column header.
Thank you,
-Nilesh
I am pretty new to Python and hence I need your help on the following:
I have two tables (dataframes):
Table 1 has all the data and it looks like that:
GenDate column has the generation day.
Date column has dates.
Column D and onwards has different values
I also have the following table:
Column I has "keywords" that can be found in the header of Table 1
Column K has dates that should be in column C of table 1
My goal is to produce a table like the following:
I have omitted a few columns for Illustration purposes.
Every column on table 1 should be split base on the Type that is written on the Header.
Ex. A_Weeks: The Weeks corresponds to 3 Splits, Week1, Week2 and Week3
Each one of these slits has a specific Date.
in the new table, 3 columns should be created, using A_ and then the split name:
A_Week1, A_Week2 and A_Week3.
for each one of these columns, the value that corresponds to the Date of each split should be used.
I hope the explanation is good.
Thanks
You can get the desired table with the following code (follow comments and check panda api reference to learn about functions used):
import numpy as np
import pandas as pd
# initial data
t_1 = pd.DataFrame(
{'GenDate': [1, 1, 1, 2, 2, 2],
'Date': [10, 20, 30, 10, 20, 30],
'A_Days': [11, 12, 13, 14, 15, 16],
'B_Days': [21, 22, 23, 24, 25, 26],
'A_Weeks': [110, 120, 130, 140, np.NaN, 160],
'B_Weeks': [210, 220, 230, 240, np.NaN, 260]})
# initial data
t_2 = pd.DataFrame(
{'Type': ['Days', 'Days', 'Days', 'Weeks', 'Weeks'],
'Split': ['Day1', 'Day2', 'Day3', 'Week1', 'Week2'],
'Date': [10, 20, 30, 10, 30]})
# create multiindex
t_1 = t_1.set_index(['GenDate', 'Date'])
# pivot 'Date' level of MultiIndex - unstack it from index to columns
# and drop columns with all NaN values
tt_1 = t_1.unstack().dropna(axis=1)
# tt_1 is what you need with multi-level column labels
# map to rename columns
t_2 = t_2.set_index(['Type'])
mapping = {
type_: dict(zip(
t_2.loc[type_, :].loc[:, 'Date'],
t_2.loc[type_, :].loc[:, 'Split']))
for type_ in t_2.index.unique()}
# new column names
new_columns = list()
for letter_type, date in tt_1.columns.values:
letter, type_ = letter_type.split('_')
new_columns.append('{}_{}'.format(letter, mapping[type_][date]))
tt_1.columns = new_columns
I have a dataframe from an import csv using pandas. This dataframe has 160 variables and I would like to keep only 5, 9, 10, 46, 89.
I try this:
dataf2 = dataf[[5] + [9] + [10] + [46] + [89]]
but I take this error:
KeyError: '[ 5 9 10 46 89] not in index'
If you want to refer to columns not by their names but by their positions in the dataset, you need to use df.iloc:
dataf.iloc[:, [5, 9, 10, 46, 89]]
Row indices are specified before the comma, column indices are specified after the comma.
If the columns that you would like to keep are: 5, 9, 10, 46, 89, then you can index just these ones like so:
dataf2 = dataf[[5, 9, 10, 46, 89]]