I have created a Pandas dataframe using:
df = pd.DataFrame(index=np.arange(140), columns=np.arange(20))
Which gives me an empty dataframe with 140 rows and 20 columns.
I have another dataframe with 120 columns and 20 rows, I call it df2. I would like to add these rows to fill df, but still retain the shape of 140x20.
When I use:
newdf = df.append(df2) I get a dataframe with 280 rows and 20 columns.
df.iloc[:len(df2), :] = df2.values
will do the job. As the no. of columns are same so we can safely do this. Other values in df will remain NaNs. This will update the df2 records at the beginning. If you want at the end, similarly, you can do df.iloc[-len(df2):, :] = df2.values
Related
I have a dataframe with 10 columns and 15 rows.
If I have an empty dataframe df2 with only two columns (A,B), If I want to copy all values in row0 in split it in multiple rows in column (A). The same is for row 2 in column(B). Row 3 in column(A) and so on. I tried many functions, but I couldn't achieve this. Any suggestion?
What you need is just to transpose the dataframe (first two rows):
import pandas as pd
import numpy as np
# generate original (test) dataframe
df = pd.DataFrame({f'col_{i+1}': np.random.randint(0, 100, 20) for i in range(10)})
# transpose the first two rows into a new dataframe
df1 = df.iloc[:2, :].T.reset_index(drop=True)
# rename columns as needed
df1.rename(columns={0: 'A', 1: 'B'}, inplace=True)
I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.
Can try this:
Get a list with the data os orange u want to keep
Filter df2 with that list
Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)
This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~ negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df2[(~df2['ID'].isin(common['ID']))&(~df2['Name'].isin(common['Name']))]
This was tested using some of your data:
df1 = pd.DataFrame({'ID':[125,134,156],'Name':['John','Mary','Bill'],'func':[1,2,2]})
df2 = pd.DataFrame({'ID':[125,139,133],'Name':['John','Joana','Linda']})
Output is:
ID Name
1 139 Joana
2 133 Linda
I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!
I have two dataframes:
df1 shape = (101, 4825)
df2 shape = (97, 5818)
The first 4825 column names of df2 are the same as df1, and then increases by +1.
However, at the end of both dataframes, there is a column named Group_number.
I want to concatenate both the data frames so that the shape of the final dataframe is of shape (198,5818), i.e the final dataframe has all the rows of both the and NaN values for the df1 section (after the initial 4825 values).
I tried pd.concat([df1,df2]) but the column Group_number gets mixed up.
This could happening because of index problem as well. Use arg "ignore_index":
pd.concat([df1,df2], ignore_index=True)
or you can test by using "keys" argument so that you will know which observation is of which original data frame:
pd.concat([df1,df2], ignore_index=True, keys=['a', 'b'])
I want to unstack a multi-index dataframe, which looks like this:
into another dataframe whose index is 'Worker_id', column names are 'Task_id' and values are 'Date_cnt'.
Could someone give a help?
I've tried df.unstack, but it automatically puts 'Date_cnt',rather than 'Task_id' as column names
Thanks!
I think this is what you want:
import pandas as pd
df = pd.DataFrame([[4529,338,6],[4529,340,4],[4529,346,4],[4529,388,4],[4529,824,1]], columns = ['Worker_id','Task_id','Date_cnt'])
df = df.set_index(['Worker_id','Task_id']).unstack()
df.columns = df.columns.droplevel()
print df
Task_id 338 340 346 388 824
Worker_id
4529 6 4 4 4 1
Because there is only one column, the Date_cnt is the very top field in the columns multiindex- if you had multiple columns before unstacking, they would all be at the very top. Since you don't want to keep that, you can just drop the column.