I want to unstack a multi-index dataframe, which looks like this:
into another dataframe whose index is 'Worker_id', column names are 'Task_id' and values are 'Date_cnt'.
Could someone give a help?
I've tried df.unstack, but it automatically puts 'Date_cnt',rather than 'Task_id' as column names
Thanks!
I think this is what you want:
import pandas as pd
df = pd.DataFrame([[4529,338,6],[4529,340,4],[4529,346,4],[4529,388,4],[4529,824,1]], columns = ['Worker_id','Task_id','Date_cnt'])
df = df.set_index(['Worker_id','Task_id']).unstack()
df.columns = df.columns.droplevel()
print df
Task_id 338 340 346 388 824
Worker_id
4529 6 4 4 4 1
Because there is only one column, the Date_cnt is the very top field in the columns multiindex- if you had multiple columns before unstacking, they would all be at the very top. Since you don't want to keep that, you can just drop the column.
Related
I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.
Can try this:
Get a list with the data os orange u want to keep
Filter df2 with that list
Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)
This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~ negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df2[(~df2['ID'].isin(common['ID']))&(~df2['Name'].isin(common['Name']))]
This was tested using some of your data:
df1 = pd.DataFrame({'ID':[125,134,156],'Name':['John','Mary','Bill'],'func':[1,2,2]})
df2 = pd.DataFrame({'ID':[125,139,133],'Name':['John','Joana','Linda']})
Output is:
ID Name
1 139 Joana
2 133 Linda
I have created a Pandas dataframe using:
df = pd.DataFrame(index=np.arange(140), columns=np.arange(20))
Which gives me an empty dataframe with 140 rows and 20 columns.
I have another dataframe with 120 columns and 20 rows, I call it df2. I would like to add these rows to fill df, but still retain the shape of 140x20.
When I use:
newdf = df.append(df2) I get a dataframe with 280 rows and 20 columns.
df.iloc[:len(df2), :] = df2.values
will do the job. As the no. of columns are same so we can safely do this. Other values in df will remain NaNs. This will update the df2 records at the beginning. If you want at the end, similarly, you can do df.iloc[-len(df2):, :] = df2.values
I have df like below with:-
import pandas as pd
# initialize list of lists
data = [[0, 2, 3],[0,2,2],[1,1,1]]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns = ['10028', '1090','1058'])
The clauses are the column names are dynamic sometimes it's 3 columns and sometimes it's 5 columns sometimes 1 column.
and I have on other df which is telling me the anomaly
# initialize list of lists
data = [[0,1,1]]
# Create the pandas DataFrame
df2 = pd.DataFrame(data, columns = ['10028', '1090','1058'])
Now if any of the columns in df2 is having value 1 it means it's an anomaly then I have to alert. the only clause is I want to check if 1090 is 1 in df2 then the value of 1090 in df1 and if it's less than 4 then do nothing
As of now, I am doing it like this:-
if df2.any(axis=1).any() == True:
print("alert")
I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!
After creating a DataFrame with some duplicated cell values in column with the name 'keys':
import pandas as pd
df = pd.DataFrame({'keys': [1,2,2,3,3,3,3],'values':[1,2,3,4,5,6,7]})
I go ahead and create two more DataFrames which are the consolidated versions of the original DataFrame df. Those newly created DataFrames will have no duplicated cell values under the 'keys' column:
df_sum = df_a.groupby('keys', axis=0).sum().reset_index()
df_mean = df_b.groupby('keys', axis=0).mean().reset_index()
As you can see df_sum['values'] cells values were all summed together.
While df_mean['values'] cell values were averaged with mean() method.
Lastly I rename the 'values' column in both dataframes with:
df_sum.columns = ['keys', 'sums']
df_mean.columns = ['keys', 'means']
Now I would like to copy the df_mean['means'] column into the dataframe df_sum.
How to achieve this?
The Photoshoped image below illustrates the dataframe I would like to create. Both 'sums' and 'means' columns are merged into a single DataFrame:
There are several ways to do this. Using the merge function off the dataframe is the most efficient.
df_both = df_sum.merge(df_mean, how='left', on='keys')
df_both
Out[1]:
keys sums means
0 1 1 1.0
1 2 5 2.5
2 3 22 5.5
I think pandas.merge() is the function you are looking for. Like pd.merge(df_sum, df_mean, on = "keys"). Besides, this result can also be summarized on one agg function as following:
df.groupby('keys')['values'].agg(['sum', 'mean']).reset_index()
# keys sum mean
#0 1 1 1.0
#1 2 5 2.5
#2 3 22 5.5