How to summarize rows on column in pandas dataframe

How to summarize rows on column in pandas dataframe - python

I have a DataFrame that looks like this:
Col2 Col3
0 5 8
1 1 0
2 3 5
3 4 1
4 0 7
How can I sum values and get rid of index. To make it looks like this?
Col2 Col3
13 21
Sample code:
import pandas as pd
df = pd.DataFrame()
df["Col1"] = [0,2,4,6,2]
df["Col2"] = [5,1,3,4,0]
df["Col3"] = [8,0,5,1,7]
df["Col4"] = [1,4,6,0,8]
df_new = df.iloc[:, 1:3]
print(df_new)

Use .sum() to get the sums for each column. It produces a Series where each row contains the sum of a column. Transpose this to turn each row into a column, and then use .to_string(index=False) to print out the DataFrame without the index:
pd.DataFrame(df.sum()).T.to_string(index=False)
This outputs:
Col2 Col3
13 21

You can try:
df_new.sum(axis = 0)
df_new.reset_index(drop=True, inplace=True)

Related

How to return column&row index of cell of certain value

index col1 col2 col3
0 0 1 0
1 1 0 1
2 1 1 0
I am just stuck at a task: to find locations(indices) of all cells that equals to 1.
I was trying to use such a statement
column_result=[]
row_result=[]
for column in df:
column_result=column_result.append(df.index[df[i] != 0])
for row in df：
row_result=row_result.append(df.index[df[i]!=0)
my logic is using loops to traverse the colomns and rows separately and concatenate them later
however it returns'NoneType' object has no attribute 'append'
would you please help me to debug and complete this task

Use numpy.where for indices for index and columns and then select them for cols, idx lists:
i, c = np.where(df.ne(0))
cols = df.columns[c].tolist()
idx = df.index[i].tolist()
print (idx)
[0, 1, 1, 2, 2]
print (cols)
['col2', 'col1', 'col3', 'col1', 'col2']
Or use DataFrame.stack with filtering for final DataFrame:
s = df.stack()
df1 = s[s.ne(0)].rename_axis(['idx','cols']).index.to_frame(index=False)
print (df1)
idx cols
0 0 col2
1 1 col1
2 1 col3
3 2 col1
4 2 col2

Deleting rows in Pandas Dataframe, when column values match tuples in a list

I have a data frame
index col1 col2 col3
0 1 3 5
1 12 7 21
... ... ... ...
I want to delete some rows, with the criteria being that the values in col1 and col2 show up in a certain list.
Let the list be [(12,7),(100,34),...].
In this case, the row with index 1 would be deleted.

Use Index.isin for test MultiIndex created by both columns by DataFrame.set_index, invert mask by ~ and filter in boolean indexing:
L = [(12,7),(100,34)]
df = df[~df.set_index(['col1','col2']).index.isin(L)]
print (df)
col1 col2 col3
0 1 3 5

How to group two appended dataframes by a column?

I have 2 dataframes with the same column names, for example:
col1 col2 col3
1 2 3
and
col1 col2 col3
4 5 6
1 7 8
I have appended them, so now the new dataframe is like below:
col1 col2 col3
1 2 3
4 5 6
1 7 8
The problem is that I need the rows that have the same value in the col1 to come one after the other, just like this:
col1 col2 col3
1 2 3
1 7 8
4 5 6
How can I sort the dataframe by col1 to create this effect(without modifying the dataframe type)?

Use DataFrame.sort_values:
df = pd.concat([df1, df2]).sort_values('col1', ignore_index=True)

if you care about ensuring that dataframe 1 values are sorted before dataframe 2 values where they are tied, you can use the 'mergesort' algorithm. The default algorithm will arbritarily order any tied values.
df.sort_values(by='col1', axis=1, ascending=True, inplace=True, kind='mergesort')

You can sort a DataFrame by any column, for example:
df.sort_values(by=['col_1', 'col_2'], ascending=[True, False], inplace=True)
After that you may like to reset the row index, as they will be jumbled up:
df.reset_index(drop=True, inplace=True)

How to make a sum row for two columns python dataframe

I have a pandas dataframe:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
And I want to add a new row summing over two columns [Col1,Col2] like:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
Total 3 5 NaN
Ignoring Col3. What should I do? Thanks in advance.

You can use the pandas.DataFrame.append and pandas.DataFrame.sum methods:
df2 = df.append(df.sum(), ignore_index=True)
df2.iloc[-1, df2.columns.get_loc('Col3')] = np.nan

You can use pd.DataFrame.loc. Note the final column will be converted to float since NaN is considered float:
import numpy as np
df.loc['Total'] = [df['Col1'].sum(), df['Col2'].sum(), np.nan]
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].astype(int)
print(df)
Col1 Col2 Col3
0 1 2 3.0
1 2 3 4.0
Total 3 5 NaN

Merge two pandas series based on missing data

If I have:
col1 col2
0 1 np.nan
1 2 np.nan
2 np.nan 3
4 np.nan 4
How would I efficiently get to:
col1 col2 col3
0 1 np.nan 1
1 2 np.nan 2
2 np.nan 3 3
4 np.nan 4 4
My current solution is:
test = pd.Series([1,2,np.nan, np.nan])
test2 = pd.Series([np.nan, np.nan, 3,4])
temp_df = pd.concat([test, test2], axis = 1)
init_cols = list(temp_df.columns)
temp_df['test3'] = ""
for col in init_cols:
temp_df.ix[temp_df[col].fillna("") != "", 'test3'] = list(temp_df.ix[temp_df[col].fillna("") != "", col])
Ideally I would like to avoid the use of loops.

It depends on what you want to do in the event that each column has a non-null value.
take col1 first then fill missing with col2
df['col3'] = df.col1.fillna(df.col2)
take col2 first then fill missing with col1
df['col3'] = df.col2.fillna(df.col1)
average the overlap
df['col3'] = df.mean(1)
sum the overlap
df['col3'] = df.sum(1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to summarize rows on column in pandas dataframe - python

You can try: df_new.sum(axis = 0) df_new.reset_index(drop=True, inplace=True)

Related

How to return column&row index of cell of certain value

Deleting rows in Pandas Dataframe, when column values match tuples in a list

How to group two appended dataframes by a column?

How to make a sum row for two columns python dataframe

Merge two pandas series based on missing data

Categories

Resources