My df looks as follows:
Roll Name Age Physics English Maths
0 A1 Max 16 87 79 90
1 A2 Lisa 15 47 75 60
2 A3 Luna 17 83 49 95
3 A4 Ron 16 86 79 93
4 A5 Silvia 15 57 99 91
I'd like to add the columns Physics, English, and Maths and display the results in a separate column 'Grade'.
I've tried the code:
df['Physics'] + df['English'] + df['Maths']
But it just concatenates. I am not taught about the lambda function yet.
How do I go about this?
df['Grade'] = df['Physics'] + df['English'] + df['Maths']
it concatenates maybe your data is in **String** just convert into float or integer.
Check Data Types First by using df.dtypes
Try:
df["total"] = df[["Physics", "English", "Maths"]].sum(axis=1)
df
Check Below code, Its is possible you columns are in string format, belwo will solve that:
import pandas as pd
df = pd.DataFrame({"Physics":['1','2','3'],"English":['1','2','3'],"Maths":['1','2','3']})
df['Total'] = df['Physics'].astype('int') +df['English'].astype('int') +df['Maths'].astype('int')
df
Output:
Related
I have a below data in file
NAME,AGE,MARKS
A1,12,40
B1,13,54
C1,15,67
D1,11,41
E1,16,59
F1,10,60
If the data was in database table , I would have used Sum and Average function to get the cumulative sum and average
But How to get it with python is a bit challenging , As i am learner
Expected output :
NAME,AGE,MARKS,CUM_SUM,AVG
A1,12,40,40,40
B1,13,54,94,47
C1,15,67,161,53.66
D1,11,41,202,50.5
E1,16,59,261,43.5
F1,10,60,321,45.85
IIUC use:
df = pd.read_csv('file')
df['CUM_SUM'] = df['MARKS'].cumsum()
df['AVG'] = df['MARKS'].expanding().mean()
print (df)
NAME AGE MARKS CUM_SUM AVG
0 A1 12 40 40 40.000000
1 B1 13 54 94 47.000000
2 C1 15 67 161 53.666667
3 D1 11 41 202 50.500000
4 E1 16 59 261 52.200000
5 F1 10 60 321 53.500000
Last use:
df.to_csv('file.csv', index=False)
Or:
out = df.to_string(index=False)
This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 1 year ago.
I have 4 Values data as output of a function. Here's my data
Name Grade
usia (75,78,90,52)
shdh (85,68,60,72)
fbjg (95,58,65,66)
Here's what I want
Name Math English Physics Chemistry
usia 75 78 90 52
shdh 85 68 60 72
fbjg 95 58 65 66
Use DataFrame constructor with DataFrame.pop for remove original column Grade:
import ast
#if strings inputs instead tuples
#df['Grade'] = df['Grade'].apply(ast.literal_eval)
cols = ['Math','English','Physics','Chemistry']
df[cols] = pd.DataFrame(df.pop('Grade').tolist(), index=df.index)
print (df)
Name Math English Physics Chemistry
0 usia 75 78 90 52
1 shdh 85 68 60 72
2 fbjg 95 58 65 66
When creating a dataframe as below (instructions from here), the order of the columns changes from "Day, Visitors, Bounce Rate" to "Bounce Rate, Day, Visitors"
import pandas as pd
web_stats = {'Day':[1,2,3,4,5,6],
'Visitors':[43,34,65,56,29,76],
'Bounce Rate':[65,67,78,65,45,52]}
df = pd.DataFrame(web_stats)
Gives:
Bounce Rate Day Visitors
0 65 1 43
1 67 2 34
2 78 3 65
3 65 4 56
4 45 5 29
5 52 6 76
How can the order be kept in tact? (i.e. Day, Visitors, Bounce Rate)
One approach is to use columns
Ex:
import pandas as pd
web_stats = {'Day':[1,2,3,4,5,6],
'Visitors':[43,34,65,56,29,76],
'Bounce Rate':[65,67,78,65,45,52]}
df = pd.DataFrame(web_stats, columns = ['Day', 'Visitors', 'Bounce Rate'])
print(df)
Output:
Day Visitors Bounce Rate
0 1 43 65
1 2 34 67
2 3 65 78
3 4 56 65
4 5 29 45
5 6 76 52
Dictionaries are not considered to be ordered in Python <3.7.
You can use collections.OrderedDict instead:
from collections import OrderedDict
web_stats = OrderedDict([('Day', [1,2,3,4,5,6]),
('Visitors', [43,34,65,56,29,76]),
('Bounce Rate', [65,67,78,65,45,52])])
df = pd.DataFrame(web_stats)
If you don't want to write the column names which becomes really inconvenient if you have multiple keys you may use
df = pd.DataFrame(web_stats, columns = web_stats.keys())
I started learning pandas and stuck at below issue:
I have two large DataFrames:
df1=
ID KRAS ATM
TCGA-3C-AAAU-01A-11R-A41B-07 101 32
TCGA-3C-AALI-01A-11R-A41B-07 101 75
TCGA-3C-AALJ-01A-31R-A41B-07 102 65
TCGA-3C-ARLJ-01A-61R-A41B-07 87 54
df2=
ID BRCA1 ATM
TCGA-A1-A0SP 54 65
TCGA-3C-AALI 191 8
TCGA-3C-AALJ 37 68
The ID is the index in both df. First, I want to cut the name of the ID to only the first 10 digits ( convert TCGA-3C-AAAU-01A-11R-A41B-07 to TCGA-3C-AAAU) in df1. Then I want to produce a new df from df1 which has the ID that exist in df2.
df3 should look:
ID KRAS ATM
TCGA-3C-AALI 101 75
TCGA-3C-AALJ 102 65
I tried different ways but failed. Any suggestions on this, please?
Here is one way using vectorised functions:
# truncate to first 10 characters, or 12 including '-'
df1['ID'] = df1['ID'].str[:12]
# filter for IDs in df2
df3 = df1[df1['ID'].isin(df2['ID'])]
Result
ID KRAS ATM
1 TCGA-3C-AALI 101 75
2 TCGA-3C-AALJ 102 65
Explanation
Use .str accessor to limit df1['ID'] to first 12 characters.
Mask df1 to include only IDs found in df2.
IIUC TCGA-3C-AAAU this contain 12 character :-)
df3=df1.assign(ID=df1.ID.str[:12]).loc[lambda x:x.ID.isin(df2.ID),:]
df3
Out[218]:
ID KRAS ATM
1 TCGA-3C-AALI 101 75
2 TCGA-3C-AALJ 102 65
I have a dataframe (df_input), and im trying to convert it to another dataframe (df_output), through applying a formula to each element in each row. The formula requires information about the the whole row (min, max, median).
df_input:
A B C D E F G H I J
2011-01-01 60 48 26 29 41 91 93 87 39 65
2011-01-02 88 52 24 99 1 27 12 26 64 87
2011-01-03 13 1 38 60 8 50 59 1 3 76
df_output:
F(A)F(B)F(C)F(D)F(E)F(F)F(G)F(H)F(I)F(J)
2011-01-01 93 54 45 52 8 94 65 37 2 53
2011-01-02 60 44 94 62 78 77 37 97 98 76
2011-01-03 53 58 16 63 60 9 31 44 79 35
Im trying to go from df_input to df_output, as above, after applying f(x) to each cell per row. The function foo is trying to map element x to f(x) by doing an OLS regression of the min, median and max of the row to some co-ordinates. This is done each period.
I'm aware that I iterate over the rows and then for each row apply the function to each element. Where i am struggling is getting the output of foo, into df_output.
for index, row in df_input.iterrows():
min=row.min()
max=row.max()
mean=row.mean()
#apply function to row
new_row = row.apply(lambda x: foo(x,min,max,mean)
#add this to df_output
help!
My current thinking is to build up the new df row by row? I'm trying to do that but im getting a lot of multiindex columns etc. Any pointers would be great.
thanks so much... merry xmas to you all.
Consider calculating row aggregates with DataFrame.* methods and then pass series values in a DataFrame.apply() across columns:
# ROW-WISE AGGREGATES
df['row_min'] = df.min(axis=1)
df['row_max'] = df.max(axis=1)
df['row_mean'] = df.mean(axis=1)
# COLUMN-WISE CALCULATION (DEFAULT axis=0)
new_df = df[list('ABCDEFGHIJ')].apply(lambda col: foo(col,
df['row_min'],
df['row_max'],
df['row_mean']))