How to make a sum row for two columns python dataframe - python

I have a pandas dataframe:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
And I want to add a new row summing over two columns [Col1,Col2] like:
Col1 Col2 Col3
0 1 2 3
1 2 3 4
Total 3 5 NaN
Ignoring Col3. What should I do? Thanks in advance.

You can use the pandas.DataFrame.append and pandas.DataFrame.sum methods:
df2 = df.append(df.sum(), ignore_index=True)
df2.iloc[-1, df2.columns.get_loc('Col3')] = np.nan

You can use pd.DataFrame.loc. Note the final column will be converted to float since NaN is considered float:
import numpy as np
df.loc['Total'] = [df['Col1'].sum(), df['Col2'].sum(), np.nan]
df[['Col1', 'Col2']] = df[['Col1', 'Col2']].astype(int)
print(df)
Col1 Col2 Col3
0 1 2 3.0
1 2 3 4.0
Total 3 5 NaN

Related

How to swap column1 value with colum 2 value under a condition in Pandas

I'd like to swap column1 value with column2 value if column1.value >= 14 in pandas!
col1
col2
16
1
3
2
4
3
This should become:
col1
col2
1
16
3
2
4
3
Thanks!
Use Series.mask and re-assign the two columns values:
m = df["col1"].ge(14)
out = df.assign(
col1=df["col1"].mask(m, df["col2"]),
col2=df["col2"].mask(m, df["col1"])
)
Output:
col1 col2
0 1 16
1 3 2
2 4 3
Simple one liner solution,
df.loc[df['col1'] >= 14,['col1','col2']] = df.loc[df['col1'] >= 14,['col2','col1']].values

How to divide rows in pandas dataframe parwise

I have a pandas dataframe, test, looking like the following:
Col1 Col2 Col 3
A 4 6
A 8 36
B 1 4
B 6 8
Now, I want to pairwise divide the rows of the dataframe resulting in:
Col1 Col2 Col 3
A 2 6
B 6 2
Hence I want to divide the second of the pair by the first of the pair. I amtrying to use groupby but without success.
Anyone a solution?
If you always have a pair of rows, you can just try iloc:
(df.iloc[1::2, 1:]
.div(df.iloc[::2,1:].to_numpy())
.assign(Col1=df.iloc[1::2,1])
)
If the Col1 pair doesn't repeat.
def divide(group):
# You could also use head(1)/tail(1) and first()/last().
return group.iloc[-1] / group.iloc[0]
df_ = df.groupby('Col1').apply(divide).reset_index()
# print(df)
Col1 Col2 Col3
0 A 2.0 6.0
1 B 6.0 2.0
Another option using groupby on the first column and using nth to divide
g = df.groupby("Col1")
out = g.nth(1).div(g.nth(0)).reset_index()
print(out)
Col1 Col2 Col3
0 A 2.0 6.0
1 B 6.0 2.0

How to switch n columns to rows of a r rows pandas dataframe (n*r rows in the final dataframe)?

Let's take this dataframe :
pd.DataFrame(dict(Col1=["a","c"],Col2=["b","d"],Col3=[1,3],Col4=[2,4]))
Col1 Col2 Col3 Col4
0 a b 1 2
1 c d 3 4
I would like to have one row per value in column Col1 and column Col2 (n=2 and r=2 so the expected dataframe have 2*2 = 4 rows).
Expected result :
Ind Value Col3 Col4
0 Col1 a 1 2
1 Col1 c 3 4
2 Col2 b 1 2
3 Col2 d 3 4
How please could I do ?
Pandas melt does the job here; the rest just has to do with repositioning and renaming the columns appropriately.
Use pandas melt to transform the dataframe, using Col3 and 4 as the index variables. melt typically converts from wide to long.
Next step - reindex the columns, with variable and value as lead columns.
Finally, rename the columns appropriately.
(df.melt(id_vars=['Col3','Col4'])
.reindex(['variable','value','Col3','Col4'],axis=1)
.rename({'variable':'Ind','value':'Value'},axis=1)
)
Ind Value Col3 Col4
0 Col1 a 1 2
1 Col1 c 3 4
2 Col2 b 1 2
3 Col2 d 3 4

Deleting rows from a pandas Dataframe which does not match a combination of colums in another Dataframe

My data Frame 1 looks like:
Col1 Col2 Col3
1 A 4 ab
2 A 5 de
3 A 2 ah
4 B 1 ac
5 B 3 jd
6 B 2 am
data frame 2:
col1 col2
1 A 4
2 B 3
How do i delete all the rows in Data Frame 1 which do not match the combination of rows of dataframe 2?
Output Expected:
Col1 Col2 Col3
1 A 4 ab
2 B 3 jd
Use DataFrame.merge with inner join, only necessary rename columns:
df = df2.rename(columns={'col1':'Col1','col2':'Col2'}).merge(df1, on=['Col1','Col2'])
#on should be omited, then merge by intersection of columns of df1, df2
#df = df2.rename(columns={'col1':'Col1','col2':'Col2'}).merge(df1)
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd
Another idea is use left_on and right_on parameter and then remove columns with names by df2.columns:
df = (df2.merge(df1, left_on=['col1','col2'],
right_on=['Col1','Col2']).drop(df2.columns, axis=1))
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd
If columns names are same:
print (df2)
Col1 Col2
1 A 4
2 B 3
df = df2.merge(df1, on=['Col1','Col2'])
#df = df2.merge(df1)
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd
You can also use join, to do an inner join
dfR = df1.join( df ,on=['Col1','Col2'] ,how='inner',rsuffix='_x')
dfR[['Col1','Col2','Col3']]
This will also give you the same result
Col1 Col2 Col3
1 A 4 ab
2 B 3 jd
For more details check these links Join Documentation and
examples

removing rows of pandas input file.

I am reading files in pandas for which column names are not starting with row number one , instead there is headline/name row 1 of data.csv
>>> df = pd.read_csv("data.csv")
>>> df
Unnamed: 0 Unnamed: 1 name Unnamed: 3
0 col1 col2 col3 col4
1 1 2 3 4
2 2 5 4 6
In this case how i can delete row with headlines/names and make sure actual column names are col1, col2 etc.
Thanks in advance
You can choose to skip rows:
You can choose specific line numbers to skip or a quantity of lines to skip. If you use specific row numbers, then pass a list to skiprows. In your case you could use the following to be certain things are read correctly:
pd.read_csv("data.csv",header=[0], skiprows=[0])
Data:
I used the following data stored in a file called data.csv
,,name,
0, col1, col2, col3, col4,
1, 1, 2, 3, 4,
2, 2, 5, 4, 6
Output:
0 col1 col2 col3 col4 Unnamed: 5
0 1 1 2 3 4 NaN
1 2 2 5 4 6 NaN
From the docs:
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.
Link to source:
Here is a link to the documentation for your reference.
Considering your data is in data.csv, you can use below code:
df = pd.read_csv("data.csv", skiprows=1)
Output:
col1 col2 col3 col4 Unnamed: 4 Unnamed: 5 Unnamed: 6
0 1 2 3 4 NaN NaN NaN
1 2 5 4 6 NaN NaN NaN
Remove the unwanted columns with
df = df.dropna(axis=1)
print(df)
Output:
col1 col2 col3 col4
0 1 2 3 4
1 2 5 4 6
As #jpp pointed out you can also achieve these in one step as follows:
df = pd.read_csv("data.csv", skiprows=1, usecols=['col1', 'col2', 'col3', 'col4'])
Refer to read_csv(), dropna() for more information.

Categories

Resources