I have a pandas df and I would like to add values for each row from the "total_load" column with the "Battery capacity" column. For example 4755 +(-380) = 4375 and so on.
Obviously, what I am doing right now is for every row in the "Battery capacity" column do: 5200 - the value from "total_load" column. Any ideas how I can write that? Should I use an for loop?
df["Battery capacity"] = 5200 + df["total_load"]
Output should be something like:
time total_load battery capacity
2016-06-01 00:00:00 -445 4755
2016-06-01 01:00:00 -380 4375
2016-06-01 02:00:00 -350 4025
Thanks!
IIUC, use cumsum to get a "running total" of total_load:
df['Battery capacity'] = df['total_load'].cumsum() + 5200
Output:
Battery capacity total_load
time
2016-01-01 00:00:00 4755.0 -445.0
2016-01-01 01:00:00 4375.0 -380.0
2016-01-01 02:00:00 4025.0 -350.0
2016-01-01 03:00:00 3685.0 -340.0
Related
Good evening,
is it possible to calculate with - let's say - two columns inside a dataframe and add a third column with the fitting result?
Dataframe (original):
name time_a time_b
name_a 08:00:00 09:00:00
name_b 07:45:00 08:15:00
name_c 07:00:00 08:10:00
name_d 06:00:00 10:00:00
Or to be specific...is it possible to obtain the difference of two times (time_b - time_a) and create a
new column (time_c) at the end of the dataframe?
Dataframe (new):
name time_a time_b time_c
name_a 08:00:00 09:00:00 01:00:00
name_b 07:45:00 08:15:00 00:30:00
name_c 07:00:00 08:10:00 01:10:00
name_d 06:00:00 10:00:00 04:00:00
Thanks and a good night!
If your columns are in datetime or timedelta format:
# New column is a timedelta object
df["time_c"] = (df["time_b"] - df["time_a"])
If your columns are in datetime.time format (which it appears they are):
def time_diff(time_1,time_2):
"""returns the difference between time 1 and time 2 (time_2-time_1)"""
now = datetime.datetime.now()
time_1 = datetime.datetime.combine(now,time_1)
time_2 = datetime.datetime.combine(now,time_2)
return time_2 - time_1
# Apply the function
df["time_c"] = df[["time_a","time_b"]].apply(lambda arr: time_diff(*arr), axis=1)
Alternatively, you can convert to a timedelta by first converting to a string:
df["time_a"]=pd.to_timedelta(df["time_a"].astype(str))
df["time_b"]=pd.to_timedelta(df["time_b"].astype(str))
df["time_c"] = df["time_b"] - df["time_a"]
I have 7 columns of data, indexed by datetime (30 minutes frequency) starting from 2017-05-31 ending in 2018-05-25. I want to plot the mean of specific range of date (seasons). I have been trying groupby, but I can't get to group by specific range. I get wrong results if I do df.groupby(df.date.dt.month).mean().
A few lines from the dataset (date range is from 2017-05-31 to 2018-05-25)
50 51 56 58
date
2017-05-31 00:00:00 200.213542 276.929198 242.879051 NaN
2017-05-31 00:30:00 200.215478 276.928229 242.879051 NaN
2017-05-31 01:00:00 200.215478 276.925324 242.878083 NaN
2017-06-01 01:00:00 200.221288 276.944691 242.827729 NaN
2017-06-01 01:30:00 200.221288 276.944691 242.827729 NaN
2017-08-31 09:00:00 206.961886 283.374453 245.041349 184.358250
2017-08-31 09:30:00 206.966727 283.377358 245.042317 184.360187
2017-12-31 09:00:00 212.925877 287.198416 247.455413 187.175144
2017-12-31 09:30:00 212.926846 287.196480 247.465097 187.179987
2018-03-31 23:00:00 213.304498 286.933093 246.469647 186.887548
2018-03-31 23:30:00 213.308369 286.938902 246.468678 186.891422
2018-04-30 23:00:00 215.496812 288.342024 247.522230 188.104749
2018-04-30 23:30:00 215.497781 288.340086 247.520294 188.103780
I have created these variables (These are the ranges I need)
increment_rates_winter = df['2017-08-30'].mean() - df['2017-06-01'].mean()
increment_rates_spring = df['2017-11-30'].mean() - df['2017-09-01'].mean()
increment_rates_summer = df['2018-02-28'].mean() - df['2017-12-01'].mean()
increment_rates_fall = df['2018-05-24'].mean() - df['2018-03-01'].mean()
Concatenated them:
df_seasons =pd.concat([increment_rates_winter,increment_rates_spring,increment_rates_summer,increment_rates_fall],axis=1)
and after plotting, I got this:
However, I've been trying to get this:
df_seasons
Out[664]:
Winter Spring Summer Fall
50 6.697123 6.948447 -1.961549 7.662622
51 6.428329 4.760650 -2.188402 5.927087
52 5.580953 6.667529 1.136889 12.939295
53 6.406259 2.506279 -2.105125 6.964549
54 4.332826 3.678492 -2.574769 6.569398
56 2.222032 3.359607 -2.694863 5.348258
58 NaN 1.388535 -0.035889 4.213046
The seasons in x and the means plotted for each column.
Winter = df['2017-06-01':'2017-08-30']
Spring = df['2017-09-01':'2017-11-30']
Summer = df['2017-12-01':'2018-02-28']
Fall = df['2018-03-01':'2018-05-30']
Thank you in advance!
We can get a specific date range in the following way, and then you can define it however you want and take the mean
import pandas as pd
df = pd.read_csv('test.csv')
df['date'] = pd.to_datetime(df['date'])
start_date = "2017-12-31 09:00:00"
end_date = "2018-04-30 23:00:00"
mask = (df['date'] > start_date) & (df['date'] <= end_date)
f_df = df.loc[mask]
This gives the output
date 50 ... 58
8 2017-12-31 09:30:00 212.926846 ... 187.179987 NaN
9 2018-03-31 23:00:00 213.304498 ... 186.887548 NaN
10 2018-03-31 23:30:00 213.308369 ... 186.891422 NaN
11 2018-04-30 23:00:00 215.496812 ... 188.104749 NaN
Hope this helps
How about transpose it:
df_seasons.T.plot()
Output:
(Not duplicate / my question is entirely different)
My dataframe looks like this:
# [df2] is day based
time time2
2017-01-01, 2017-01-01 00:12:00
2017-01-02, 2017-01-02 03:15:00
2017-01-03, 2017-01-03 01:25:00
2017-01-04, 2017-01-04 04:12:00
2017-01-05, 2017-01-05 00:45:00
....
# [df] is minute based
time value
2017-01-01 00:01:00, 0.1232
2017-01-01 00:02:00, 0.1232
2017-01-01 00:03:00, 0.1232
2017-01-01 00:04:00, 0.1232
2017-01-01 00:05:00, 0.1232
....
I want to create a new column called time_val_min in [df2] that finds the min value between df2['time2'] and df2['time'] form [df] within the range specified in df2['time'] and df2['time2']
What did I do?
I did df2['time_val_min'] = df[df['time'].dt.hour.between(df2['time'], df2['time'])].min() but it does not work.
Could you please let me know how to fix it?
You can merge the two data frame on date, and filter the time:
# create the date from the time column
df['date'] = df['time'].dt.normalize()
# merge
new_df = (df.merge(df2, left_on='date', # left on date
right_on='time', # right on time, if time is purely beginning of days
how='right',
suffixes=['','_y'])
.query('time < time2')
.groupby('date')
['time'].min()
.to_frame(name='time_val_min')
.merge(df2, right_on='time', left_index=True)
)
Output:
time_val_min time time2
0 2017-01-01 00:01:00 2017-01-01 2017-01-01 00:12:00
My dataset looks like this:
time Open
2017-01-01 00:00:00 1.219690
2017-01-01 01:00:00 1.688490
2017-01-01 02:00:00 1.015285
2017-01-01 03:00:00 1.357672
2017-01-01 04:00:00 1.293786
2017-01-01 05:00:00 1.040048
2017-01-01 06:00:00 1.225080
2017-01-01 07:00:00 1.145402
...., ....
2017-12-31 23:00:00 1.145402
I want to find the sum between the time-range specified and save it to new dataframe.
let's say,
I want to find the sum between 2017-01-01 22:00:00 and 2017-01-02 04:00:00. This is the sum of 6 hours between 2 days. I want to find the sum of the data in the time-range such as 10 PM to next day 4 AM and put it in a different data frame for example df_timerange_sum. Please note that we are doing sum of time in 2 different date?
What did I do?
I used the sum() to calculate time-range like this: df[~df['time'].dt.hour.between(10, 4)].sum()but it gives me sum as a whole of the df but not on the between time-range I have specified.
I also tried the resample but I cannot find a way to do it for time-specific
df['time'].dt.hour.between(10, 4) is always False because no number is larger than 10 and smaller than 4 at the same time. What you want is to mark between(4,21) and then negate that to get the other hours.
Here's what I would do:
# mark those between 4AM and 10PM
# data we want is where s==False, i.e. ~s
s = df['time'].dt.hour.between(4, 21)
# use s.cumsum() marks the consecutive False block
# on which we will take sum
blocks = s.cumsum()
# again we only care for ~s
(df[~s].groupby(blocks[~s], as_index=False) # we don't need the blocks as index
.agg({'time':'min', 'Open':'sum'}) # time : min -- select the beginning of blocks
) # Open : sum -- compute sum of Open
Output for random data:
time Open
0 2017-01-01 00:00:00 1.282701
1 2017-01-01 22:00:00 2.766324
2 2017-01-02 22:00:00 2.838216
3 2017-01-03 22:00:00 4.151461
4 2017-01-04 22:00:00 2.151626
5 2017-01-05 22:00:00 2.525190
6 2017-01-06 22:00:00 0.798234
an alternative (in my opinion more straightforward) approach that accomplishes the same thing..there's definitely ways to reduce the code but I am also relatively new to pandas
df.set_index(['time'],inplace=True) #make time the index col (not 100% necessary)
df2=pd.DataFrame(columns=['start_time','end_time','sum_Open']) #new df that stores your desired output + start and end times if you need them
df2['start_time']=df[df.index.hour == 22].index #gets/stores all start datetimes
df2['end_time']=df[df.index.hour == 4].index #gets/stores all end datetimes
for i,row in df2.iterrows():
df2.set_value(i,'sum_Open',df[(df.index >= row['start_time']) & (df.index <= row['end_time'])]['Open'].sum())
you'd have to add an if statement or something to handle the last day which ends at 11pm.
I currently have two pandas data frames which are both indexed using the pandas DateTimeIndex format.
df1
datetimeindex value
2014-01-01 00:00:00 204.501667
2014-01-01 01:00:00 125.345000
2014-01-01 02:00:00 119.660000
df2 (where the year 1900 is a filler year I added during import. Actual year does not matter)
datetimeindex temperature
1900-01-01 00:00:00 48.2
1900-01-01 01:00:00 30.2
1900-01-01 02:00:00 42.8
I would like to use pd.merge to combine the data frames based on the left index, however, I would like to ignore the year altogether to yield this:
merged_df
datetimeindex value temperature
2014-01-01 00:00:00 204.501667 48.2
2014-01-01 01:00:00 125.345000 30.2
2014-01-01 02:00:00 119.660000 42.8
so far I have tried:
merged_df = pd.merge(df1,df2,left_on =
['df1.index.month','df1.index.day','df1,index.hour'],right_on =
['df2.index.month','df2.index.day','df2.index.hour'],how = 'left')
which gave me the error KeyError: 'df2.index.month'
Is there a way to perform this merge as I have outlined it?
Thanks
You have to lose the quotesL
In [11]: pd.merge(df1, df2, left_on=[df1.index.month, df1.index.day, df1.index.hour],
right_on=[df2.index.month, df2.index.day, df2.index.hour])
Out[11]:
key_0 key_1 key_2 value temperature
0 1 1 0 204.501667 48.2
1 1 1 1 125.345000 30.2
2 1 1 2 119.660000 42.8
Here "df2.index.month" is a string whereas df2.index.month is the array of months.
Probably not as efficient because pd.to_datetime can be slow:
df2['NewIndex'] = pd.to_datetime(df2.index)
df2['NewIndex'] = df2['NewIndex'].apply(lambda x: x.replace(year=2014))
df2.set_index('NewIndex',inplace=True)
Then just do a merge on the whole index.