For the columns with name containing a specific string Time, I would like to create a new column with the same name. I want for each item of Pax_cols (if there are more than one) to update the column with the sum with the column Temp.
data={'Run_Time':[60,20,30,45,70,100],'Temp':[10,20,30,50,60,100], 'Rest_Time':[5,5,5,5,5,5]}
df=pd.DataFrame(data)
Pax_cols = [col for col in df.columns if 'Time' in col]
df[Pax_cols[0]]= df[Pax_cols[0]] + df["Temp"]
This is what I came up with, if Pax_cols has only one values, but it does not work.
Expected output:
data={'Run_Time':[70,40,60,95,130,200],'Temp':[10,20,30,50,60,100], 'Rest_Time':[15,25,35,55,65,105]}
You can use:
# get columns with "Time" in the name
cols = list(df.filter(like='Time'))
# ['Run_Time', 'Rest_Time']
# add the value of df['Temp']
df[cols] = df[cols].add(df['Temp'], axis=0)
output:
Run_Time Temp Rest_Time
0 70 10 15
1 40 20 25
2 60 30 35
3 95 50 55
4 130 60 65
5 200 100 105
Related
I am using the code below to make a search on a .csv file and match a column in both files and grab a different column I want and add it as a new column. However, I am trying to make the match based on two columns instead of one. Is there a way to do this?
import pandas as pd
df1 = pd.read_csv("matchone.csv")
df2 = pd.read_csv("comingfrom.csv")
def lookup_prod(ip):
for row in df2.itertuples():
if ip in row[1]:
return row[3]
else:
return '0'
df1['want'] = df1['name'].apply(lookup_prod)
df1[df1.want != '0']
print(df1)
#df1.to_csv('file_name.csv')
The code above makes a search from the column name 'samename' in both files and gets the column I request ([3]) from the df2. I want to make the code make a match for both column 'name' and another column 'price' and only if both columns in both df1 and df2 match then the code take the value on ([3]).
df 1 :
name price value
a 10 35
b 10 21
c 10 33
d 10 20
e 10 88
df 2 :
name price want
a 10 123
b 5 222
c 10 944
d 10 104
e 5 213
When the code is run (asking for the want column from d2, based on both if df1 name = df2 name) the produced result is :
name price value want
a 10 35 123
b 10 21 222
c 10 33 944
d 10 20 104
e 10 88 213
However, what I want is if both df1 name = df2 name and df1 price = df2 price, then take the column df2 want, so the desired result is:
name price value want
a 10 35 123
b 10 21 0
c 10 33 944
d 10 20 104
e 10 88 0
You need to use pandas.DataFrame.merge() method with multiple keys:
df1.merge(df2, on=['name','price'], how='left').fillna(0)
Method represents missing values as NaNs, so that the column's dtype changes to float64 but you can change it back after filling the missed values with 0.
Also please be aware that duplicated combinations of name and price in df2 will appear several times in the result.
If you are matching the two dataframes based on the name and the price, you can use df.where and df.isin
df1['want'] = df2['want'].where(df1[['name','price']].isin(df2).all(axis=1)).fillna('0')
df1
name price value want
0 a 10 35 123.0
1 b 10 21 0
2 c 10 33 944.0
3 d 10 20 104.0
4 e 10 88 0
Expanding on https://stackoverflow.com/a/73830294/20110802:
You can add the validate option to the merge in order to avoid duplication on one side (or both):
pd.merge(df1, df2, on=['name','price'], how='left', validate='1:1').fillna(0)
Also, if the float conversion is a problem for you, one option is to do an inner join first and then pd.concat the result with the "leftover" df1 where you already added a constant valued column. Would look something like:
df_inner = pd.merge(df1, df2, on=['name', 'price'], how='inner', validate='1:1')
merged_pairs = set(zip(df_inner.name, df_inner.price))
df_anti = df1.loc[~pd.Series(zip(df1.name, df1.price)).isin(merged_pairs)]
df_anti['want'] = 0
df_result = pd.concat([df_inner, df_anti]) # perhaps ignore_index=True ?
Looks complicated, but should be quite performant because it filters by set. I think there might be a possibility to set name and price as index, merge on index and then filter by index to not having to do the zip-set-shenanigans, bit I'm no expert on multiindex-handling.
#Try this code it will give you expected results
import pandas as pd
df1 = pd.DataFrame({'name' :['a','b','c','d','e'] ,
'price' :[10,10,10,10,10],
'value' : [35,21,33,20,88]})
df2 = pd.DataFrame({'name' :['a','b','c','d','e'] ,
'price' :[10,5,10,10,5],
'want' : [123,222,944,104 ,213]})
new = pd.merge(df1,df2, how='left', left_on=['name','price'], right_on=['name','price'])
print(new.fillna(0))
In column J would like to get the value as per excel function ie IF(H3>I3,C2,0) and based on that occurance value ie from bottom to up 1st occurance as the latest one and next to that is 2nd occurance.
enter image description here
Here is the solution:
import pandas as pd
import numpy as np
# suppose we have this DataFrame:
df = pd.DataFrame({'A':[55,23,11,100,9] , 'B':[12,72,35,4,100]})
# suppose we want to reflect values of 'A' column if its values are equal or more than values in 'B' column, otherwise return 0
# so i'll make another column named 'Result' to put the results in it
df['Result'] = np.where(df['A'] >= df['B'] , df['A'] , 0)
then if you try to print DataFrame:
df
result:
A B Result
0 55 12 55
1 11 72 0
2 23 35 0
3 100 4 100
4 9 100 0
I have a dataframe consist of the following, and want to add a new column based on
high - open < x number
and High.rowNum >= Open.rowNUm
basically I just want to get the first Row Num that match the criteria above and store it as different column
S/N
High
Low
Open
Close
Date
[New Column] e.g. High - Open >= 85 [Value of S/N]
1
100
20
22
90
1 Jan
1
2
200
40
72
50
2 Jan
3
3
390
20
55
90
2 Jan
As per my understanding based on your question and comment, you need 'S/N' in the new column which satisfy the criteria .. so simply you can use apply function in dataframe and store result as new column
df['New'] = df.apply(lambda x: x['S/N'] if x['High']-x['Open'] >= 85 else np.nan, axis=1)
Here we get new column with 'S/N' which satisfy the condition else we fill it with NaN
I need to transform the column values to header in python.
testdf = {'Student_id':['10001','10001','10001','20001','20001','30001','30001','30001'],
'Subject':['S1','S2','S3','S1','S2','S1','S2','S3'],
'Mark':['80','60','70','50','70','90','80','40']
}
testdf = pd.DataFrame(data=testdf)
testdf
I want to have a table like
When I tried below code
testdf.pivot(index="Student_id",columns="Subject")
I am getting like below:
Add parameter values to DataFrame.pivot and if necessary data cleaning - DataFrame.rename_axis for remove columns name and DataFrame.reset_index for column from index:
df = (testdf.pivot(index="Student_id",columns="Subject", values='Mark')
.rename_axis(None, axis=1)
.reset_index())
print (df)
Student_id S1 S2 S3
0 10001 80 60 70
1 20001 50 70 NaN
2 30001 90 80 40
I have been struggling with appending multiple DataFrames with varying columns and, would really appreciate your help with this problem!
My original data set looks like below
df1 = height 10
color 25
weight 3
speed 33
df2 = height 51
color 25
weight 30
speed 33
df3 = height 51
color 25
speed 30
I call transform_csv_data(csv_data, row) function to first add name on the last row. Then I transpose and move the name which becomes the last column to the first column for every DataFrame so each DataFrame looks like below before appending (but before moving the last column to front)
df1 =
0 1 2 3 4
0 height color weight speed name
1 10 25 3 33 Joe
df2 =
0 1 2 3 4
0 height color weight speed name
1 51 25 30 33 Bob
df3 =
0 1 2 3
0 height color speed name
1 51 25 30 Chris
The problem is appending DataFrames with different number of columns and each DataFrame contains two rows including header and Data as above.
The code for transform_csv_data helper function is shown below
def transform_csv_data(self, csv_data, row):
df = pd.DataFrame(list(csv_data))
df = df.iloc[:, [0, -2]] # all rows with first and second last column
df.loc[len(df)] = ['name', row]
df = df.transpose()
cols = df.columns.values.tolist() # this returns index of each column
cols.insert(0, cols.pop(-1)) # move last column to front
df = df.reindex(columns=cols)
return df
My main function for appending DataFrame is shown below
def aggregate_data(self, output_data_file_path):
df_output = pd.DataFrame()
rows = ['Joe', 'Bob', 'Chris']
for index, row in enumerate(rows):
csv_data = self.read_csv_url(row)
df = self.transform_csv_data(csv_data, row)
# ignore header unless first set of data is being processed
if index != 0 or append:
df = df[1:]
df_output = df_output.append(df)
df_output.to_csv(output_data_file_path, index=False, header=False, mode='a+')
I want my final appended DatFrame to become as below but format becomes weird as the name column goes back to the end of the column
final =
name height color weight speed
Joe 10 25 3 33
Bob 51 25 30 33
Chris 51 25 nan 30
How can I append all the DataFrame properly so data is appended to its corresponding column?
I have tried adding concat, merge, df_output = df_output.append(df_row)[df_output.columns.tolist()] but no luck so far
There are also duplicate columns which I would like to keep.
Thank you so much for your help