Sum of columns from two data frames that contain float values - python

I have two data frames.
The columns name are the same of those data frames.
I want to sum the float values of the same columns from dataframes
Then I can use
df3 = df1.add(df2)
However, my dataframes contain two colums of string. These strings are added too.
How can I wrtie the code not to add the string but to add the float in two data frames
The two sample dataframes are as follow:
df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4]),index=[0,1,2,3])
df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4]),index=[0,1,2,3])
When I used df3 = df1.add(df2)
it also added the string in column "Team" as follow:
Team Value
0 AA 4
1 BB 3
2 CC 5
3 DD 8
How can I write code without adding the Team but the Value.
Thanks,
Zep

Use the team names as indices instead of integer indices:
In [2]: df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4])).set_index('Team')
...: df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4])).set_index('Team')
In [3]: df1 + df2
Out[3]:
Value
Team
A 4
B 3
C 5
D 8
In case you have multiple other columns, just sum the columns:
total = df1['Value'] + df2['Value']
If, in addition, you need a dataframe of the same shape as df1 and df2 with Value replaced by the sum, you can do
df3 = df1.copy()
df3['Value'] = total

Related

add/combine columns after searching in a DataFrame

I'm trying to copy data from different columns to a particular column in the same DataFrame.
Index
col1A
col2A
colB
list
CT
CW
CH
0
1
:
1
b
2
2
3
3d
But prior to that I wanted to search if those columns(col1A,col2A,colB) exist in the DataFrame and group those columns which are present and move the grouped data to relevant columns(CT,CH,etc) like,
CH
CW
CT
0
1
1
1
b
b
2
2
2
3
3d
3d
I did,
col_list1 = ['ColA','ColB','ColC']
test1 = any([ i in df.columns for i in col_list1 ])
if test1==True:
df['CH'] = df['Col1A'] +df['Col2A']
df['CT'] = df['ColB']
this code is throwing me a keyerror
.
I want it to ignore columns that are not present and add only those that are present
IIUC, you can use Python set or Series.isin to find the common columns
cols = list(set(col_list1) & set(df.columns))
# or
cols = df.columns[df.columns.isin(col_list1)]
df['CH'] = df[cols].sum(axis=1)
Instead of just concatenating the columns with +, collect them into a list and use sum with axis=1:
df['CH'] = np.sum([df[c] for c in cl if c in df], axis=1)

Compare 2 columns and merge rows on match?

New to coding here and trying to make a project. I want to compare two DF, and if any of the rows in the product column matches, I want to copy it over to a new DF. The rows in DF1 and DF2 will not be in the same position. Like I want to compare row 1 DF1 against the entire column in DF2. Is there an easy solution to this?
Take a look at this: https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
You can try:
df3 = df1[df1['Product'].isin(set(df2['Product']))]
Which gives:
>>> df1 = pd.DataFrame({'prod':[1,2], 'ean':[5,6]})
>>> df1
prod ean
0 1 5
1 2 6
>>> df2 = pd.DataFrame({'prod':[3,2]})
>>> df2
prod
0 3
1 2
>>> df1[df1['prod'].isin(set(df2['prod']))]
prod ean
1 2 6
To explain:
df1[...] is to filter the rows of df1 based on criterion ...
I'm using a set() here so it is fast to check whether a row in df1 is in df2's "Product" column

Is there a Python function to repeat string patterns to get faster multiple columns in different dataframes

I'm new in Python, I need to get many variables in multiple dataframes:
I wrote this code but I need a long time to configure it for many excersises.
This is the code:
import pandas as pd
df = pd.concat([df1[df1.columns[0]], df2[df1.columns[0]], df1[df1.columns[1]],
df2[df1.columns[1]], df1[df1.columns[2]], df2[df1.columns[2]],
df1[df1.columns[3]], df2[df1.columns[3]], df1[df1.columns[4]],
df2[df1.columns[4]], df1[df1.columns[5]], df2[df1.columns[5]],
df1[df1.columns[6]], df2[df1.columns[6]]], axis=1)
The number of dataframes and columns can be much bigger. Thanks.
It looks like what you're trying to do is: for all of the columns in one dataframe, combine the columns from that dataframe with those from another with the same columns, into a single dataframe with two of every column in the same original order.
In your case:
df1 = DataFrame([['a','b','c'], ['d','e','f']])
df2 = DataFrame([['g','h','i'], ['j','k','l']])
df = concat([s for ss in [(df1[c], df2[c]) for c in df1.columns] for s in ss], axis=1)
print(df)
Result:
0 0 1 1 2 2
0 a g b h c i
1 d j e k f l

Matching the column names of two pandas data-frames in python

I have two pandas dataframes with names df1 and df2 such that
`
df1: a b c d
1 2 3 4
5 6 7 8
and
df2: b c
12 13
I want the result be like
result: b c
2 3
6 7
Here it should be noted that a b c d are the column names in pandas dataframe. The shape and values of both pandas dataframe are different. I want to match the column names of df2 with that of column names of df1 and select all the rows of df1 the headers of which are matched with the column names of df2.. df2 is only used to select the specific columns of df1 maintaining all the rows. I tried some code given below but that gives me an empty index.
df1.columns.intersection(df2.columns)
The above code is not giving me my resut as it gives index headers with no values. I want to write a code in which I can give my two dataframes as input and it compares the columns headers for selection. I don't have to hard code column names.
I believe you need:
df = df1[df1.columns.intersection(df2.columns)]
Or like #Zero pointed in comments:
df = df1[df1.columns & df2.columns]
Or, use reindex
In [594]: df1.reindex(columns=df2.columns)
Out[594]:
b c
0 2 3
1 6 7
Also as
In [595]: df1.reindex(df2.columns, axis=1)
Out[595]:
b c
0 2 3
1 6 7
Alternatively to intersection:
df = df1[df1.columns.isin(df2.columns)]

Python Pandas concatenate dataframes and rename index

I have three dataframes, all of which have 50 columns and one row. The same column names are used in each dataframe, and the single row is always indexed as 0. I'm trying to concatenate them to make viewing and comparing the data easier.
features = pd.concat([raw_features, fea_features, transformed_features], axis=0)
Now I want to rename the rows. I've tried several things including:
features = pd.concat([raw_features, fea_features, transformed_features], axis=0).reindex(['Raw_pulltest', 'FEA', 'Transformed_pulltest'])
which gives the error cannot reindex from a duplicate axis
and
features = pd.concat([raw_features, fea_features, transformed_features], axis=0).reset_index().reindex(['Raw_pulltest', 'FEA', 'Transformed_pulltest'])
which gives me the structure I want, except all values are now nan.
Please can you help me rename the index on the concatenated dataframe?
Use keys parameter in pd.concat:
Try this:
pd.concat([raw_features, fea_features, transformed_features],
axis=0, keys=['Raw_pulltest', 'FEA', 'Transformed_pulltest'])\
.reset_index(level=1, drop=True)
Example:
d1 = pd.DataFrame([[1,1,1]],index=[0])
d2 = pd.DataFrame([[2,2,2]],index=[0])
d3 = pd.DataFrame([[3,3,3]], index=[0])
pd.concat([d1,d2,d3],axis=0, keys=['d1','d2','d3']).reset_index(level=1, drop=True)
Output:
0 1 2
d1 1 1 1
d2 2 2 2
d3 3 3 3

Categories

Resources