Summing up values for rows per columns starting with 'Col' [duplicate] - python

This question already has answers here:
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 7 years ago.
I have a DataFrame like this:
df =
Col1 Col2 T3 T5
------------------
28 34 11 22
45 589 33 66
For each row I want to sum up the total values of columns whose names start with Col.
Is there some more elegant and quick way than the one shown below?
df['total'] = 0
for index, row in df.iterrows():
total_for_row = 0
for column_name, column in df.transpose().iterrows():
if 'Col' in column_name:
total_for_row = total_for_row + row[column_name]
row['total'] = total_for_row

Try this
idx = df.columns.str.startswith('Col')
df['total'] = df.iloc[:,idx].sum(axis=1)

Related

Are the values ​of column xy of df1 also present in column zy of df2? [duplicate]

This question already has answers here:
Check if Pandas column contains value from another column
(3 answers)
Check if value from one dataframe exists in another dataframe
(4 answers)
Closed 11 months ago.
I have two dataframes and I want to check which value of df1 in col1 also occurs in df2 in col1. If it occurs: a 1 in col2_new, otherwise a 0. Is it best to do this using a list? So column of df1 converted into list and then a loop over the column of the other data frame or is there a more elegant way?
df1 (before):
index
col1
1
a
2
b
3
c
df2:
index
col1
1
a
2
e
3
b
df1 (after):
index
col1
col2_new
1
a
1
2
b
1
3
c
0
Use Series.isin with converting mask to integers:
df1['col2_new'] = df1['col1'].isin(df2['col1']).astype(int)
Or:
df1['col2_new'] = np.where(df1['col1'].isin(df2['col1']), 1, 0)

How to skip Column title row in Pandas DataFrame [duplicate]

This question already has answers here:
Prevent pandas read_csv treating first row as header of column names
(4 answers)
Closed 3 years ago.
How to skip Column title row in Pandas DataFrame
My Code:
sample = pd.DataFrame(pd.read_csv('Fremont TMY_Sample_Original.csv', `Import csv`low_memory=False))
sample_header = sample.iloc[:1, 0:20] `Wants to separate first two row because these are different data at start `
sample2 = sample[sample.iloc[:, 0:16] `wants to take required data for next process`
sample2 = ('sample2', (header=False)) `Trying to skip column title row`
print(sample2)
expected output:
its an example
Data for all year (This row I wants to remove and Remaining I wants to keep)
Date Time(Hour) WindSpeed(m/s)
0 5 1 10
1 4 2 17
2 6 3 16
3 7 4 11
This should work
df = pd.read_csv("yourfile.csv", header = None)

python replace not na value [duplicate]

This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 3 years ago.
I want to create a new column and replace NA and not missing value with 0 and 1.
#df
col1
1
3
5
6
what I want:
#df
col1 NewCol
1 1
3 1
0
5 1
0
6 1
This is what I tried:
df['NewCol']=df['col1'].fillna(0)
df['NewCol']=df['col1'].replace(df['col1'].notnull(), 1)
It seems that the second line is incorrect.
Any suggestion?
You can try:
df['NewCol'] = [*map(int, pd.notnull(df.col1))]
Hope this helps.
First you will need to convert all 'na's into '0's. How you do this will vary by scope.
For a single column you can use:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
For the whole dataframe you can use:
df.fillna(0)
After this, you need to replace all nonzeros with '1's. You could do this like so:
for index, entry in enumerate(df['col']):
if entry != 0:
df['col'][index] = 1
Note that this method counts 0 as an empty entry, which may or may not be the desired functionality.

How to find the indices of identical rows based on two columns in two different pandas dataframe? [duplicate]

This question already has answers here:
python panda: return indexes of common rows
(2 answers)
Closed 4 years ago.
I have the follwing two pandas dataframes:
df1 = pd.DataFrame([[21,80,180],[23,95,191],[36,83,176]], columns = ["age", "weight", "height"])
df2 = pd.DataFrame([[22,88,184],[39,84,196],[23,95,190]], columns = ["age", "weight", "height"])
df1:
age weight height
0 21 80 180
1 23 95 191
2 36 83 176
df2:
age weight height
0 22 88 184
1 39 84 196
2 23 95 190
I would like to compare the two dataframes and get the indices of both dataframes where age and weight in one dataframe are equal to age and weight in the second dataframe. The result in this case would be:
matching_indices = [1,2] #[df1 index, df2 index]
I know how to achieve this with iterrows(), but I prefer something less time consuming since the dataset I have is relatively large. Do you have any ideas?
Use merge with default inner join and reset_index for convert index to column for prevent lost this information:
df = df1.reset_index().merge(df2.reset_index(), on=['age','weight'], suffixes=('_df1','_df2'))
print (df)
index_df1 age weight height_df1 index_df2 height_df2
0 1 23 95 191 2 190
print (df[['index_df1','index_df2']])
index_df1 index_df2
0 1 2

Add 2 column from 2 dataframe in pandas [duplicate]

This question already has answers here:
Pandas merge two dataframes summing values [duplicate]
(2 answers)
how to merge two dataframes and sum the values of columns
(2 answers)
Closed 4 years ago.
I am new to pandas, could you help me with the case belove pls
I have 2 DF:
df1 = pd.DataFrame({'A': ['name', 'color', 'city', 'animal'], 'number': ['1', '32', '22', '13']})
df2 = pd.DataFrame({'A': ['name', 'color', 'city', 'animal'], 'number': ['12', '2', '42', '15']})
df1
A number
0 name 1
1 color 32
2 city 22
3 animal 13
DF1
A number
0 name 12
1 color 2
2 city 42
3 animal 15
I need to get the sum of the colum number e.g.
DF1
A number
0 name 13
1 color 34
2 city 64
3 animal 27
but if I do new = df1 + df2 i get a
NEW
A number
0 namename 13
1 colorcolor 34
2 citycity 64
3 animalanimal 27
I even tried with merge on="A" but nothing.
Can anyone enlight me pls
Thank you
Here are two different ways: one with add, and one with concat and groupby. In either case, you need to make sure that your number columns are numeric first (your example dataframes have strings):
# set `number` to numeric (could be float, I chose int here)
df1['number'] = df1['number'].astype(int)
df2['number'] = df2['number'].astype(int)
# method 1, set the index to `A` in each and add the two frames together:
df1.set_index('A').add(df2.set_index('A')).reset_index()
# method 2, concatenate the two frames, groupby A, and get the sum:
pd.concat((df1,df2)).groupby('A',as_index=False).sum()
Output:
A number
0 animal 28
1 city 64
2 color 34
3 name 13
Merging isn't a bad idea, you just need to remember to convert numeric series to numeric, select columns to merge on, then sum on numeric columns via select_dtypes:
df1['number'] = pd.to_numeric(df1['number'])
df2['number'] = pd.to_numeric(df2['number'])
df = df1.merge(df2, on='A')
df['number'] = df.select_dtypes(include='number').sum(1) # 'number' means numeric columns
df = df[['A', 'number']]
print(df)
A number
0 name 13
1 color 34
2 city 64
3 animal 28

Categories

Resources