I have a DataFrame that looks like this:
Col2 Col3
0 5 8
1 1 0
2 3 5
3 4 1
4 0 7
How can I sum values and get rid of index. To make it looks like this?
Col2 Col3
13 21
Sample code:
import pandas as pd
df = pd.DataFrame()
df["Col1"] = [0,2,4,6,2]
df["Col2"] = [5,1,3,4,0]
df["Col3"] = [8,0,5,1,7]
df["Col4"] = [1,4,6,0,8]
df_new = df.iloc[:, 1:3]
print(df_new)
Use .sum() to get the sums for each column. It produces a Series where each row contains the sum of a column. Transpose this to turn each row into a column, and then use .to_string(index=False) to print out the DataFrame without the index:
pd.DataFrame(df.sum()).T.to_string(index=False)
This outputs:
Col2 Col3
13 21
You can try:
df_new.sum(axis = 0)
df_new.reset_index(drop=True, inplace=True)
Related
index col1 col2 col3
0 0 1 0
1 1 0 1
2 1 1 0
I am just stuck at a task: to find locations(indices) of all cells that equals to 1.
I was trying to use such a statement
column_result=[]
row_result=[]
for column in df:
column_result=column_result.append(df.index[df[i] != 0])
for row in df:
row_result=row_result.append(df.index[df[i]!=0)
my logic is using loops to traverse the colomns and rows separately and concatenate them later
however it returns'NoneType' object has no attribute 'append'
would you please help me to debug and complete this task
Use numpy.where for indices for index and columns and then select them for cols, idx lists:
i, c = np.where(df.ne(0))
cols = df.columns[c].tolist()
idx = df.index[i].tolist()
print (idx)
[0, 1, 1, 2, 2]
print (cols)
['col2', 'col1', 'col3', 'col1', 'col2']
Or use DataFrame.stack with filtering for final DataFrame:
s = df.stack()
df1 = s[s.ne(0)].rename_axis(['idx','cols']).index.to_frame(index=False)
print (df1)
idx cols
0 0 col2
1 1 col1
2 1 col3
3 2 col1
4 2 col2
I have a data frame
index col1 col2 col3
0 1 3 5
1 12 7 21
... ... ... ...
I want to delete some rows, with the criteria being that the values in col1 and col2 show up in a certain list.
Let the list be [(12,7),(100,34),...].
In this case, the row with index 1 would be deleted.
Use Index.isin for test MultiIndex created by both columns by DataFrame.set_index, invert mask by ~ and filter in boolean indexing:
L = [(12,7),(100,34)]
df = df[~df.set_index(['col1','col2']).index.isin(L)]
print (df)
col1 col2 col3
0 1 3 5
I have 2 dataframes with the same column names, for example:
col1 col2 col3
1 2 3
and
col1 col2 col3
4 5 6
1 7 8
I have appended them, so now the new dataframe is like below:
col1 col2 col3
1 2 3
4 5 6
1 7 8
The problem is that I need the rows that have the same value in the col1 to come one after the other, just like this:
col1 col2 col3
1 2 3
1 7 8
4 5 6
How can I sort the dataframe by col1 to create this effect(without modifying the dataframe type)?
Use DataFrame.sort_values:
df = pd.concat([df1, df2]).sort_values('col1', ignore_index=True)
if you care about ensuring that dataframe 1 values are sorted before dataframe 2 values where they are tied, you can use the 'mergesort' algorithm. The default algorithm will arbritarily order any tied values.
df.sort_values(by='col1', axis=1, ascending=True, inplace=True, kind='mergesort')
You can sort a DataFrame by any column, for example:
df.sort_values(by=['col_1', 'col_2'], ascending=[True, False], inplace=True)
After that you may like to reset the row index, as they will be jumbled up:
df.reset_index(drop=True, inplace=True)
I have a pandas dataframe:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
And I want to add a new row summing over two columns [Col1,Col2] like:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
Total 3 5 NaN
Ignoring Col3. What should I do? Thanks in advance.
You can use the pandas.DataFrame.append and pandas.DataFrame.sum methods:
df2 = df.append(df.sum(), ignore_index=True)
df2.iloc[-1, df2.columns.get_loc('Col3')] = np.nan
You can use pd.DataFrame.loc. Note the final column will be converted to float since NaN is considered float:
import numpy as np
df.loc['Total'] = [df['Col1'].sum(), df['Col2'].sum(), np.nan]
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].astype(int)
print(df)
Col1 Col2 Col3
0 1 2 3.0
1 2 3 4.0
Total 3 5 NaN
If I have:
col1 col2
0 1 np.nan
1 2 np.nan
2 np.nan 3
4 np.nan 4
How would I efficiently get to:
col1 col2 col3
0 1 np.nan 1
1 2 np.nan 2
2 np.nan 3 3
4 np.nan 4 4
My current solution is:
test = pd.Series([1,2,np.nan, np.nan])
test2 = pd.Series([np.nan, np.nan, 3,4])
temp_df = pd.concat([test, test2], axis = 1)
init_cols = list(temp_df.columns)
temp_df['test3'] = ""
for col in init_cols:
temp_df.ix[temp_df[col].fillna("") != "", 'test3'] = list(temp_df.ix[temp_df[col].fillna("") != "", col])
Ideally I would like to avoid the use of loops.
It depends on what you want to do in the event that each column has a non-null value.
take col1 first then fill missing with col2
df['col3'] = df.col1.fillna(df.col2)
take col2 first then fill missing with col1
df['col3'] = df.col2.fillna(df.col1)
average the overlap
df['col3'] = df.mean(1)
sum the overlap
df['col3'] = df.sum(1)