Conditional statements on time series data python - python

I'm trying to execute conditional statements on time series data. Is there a way to set the "t_value" to zero if the time is between 1) 00:00:00 and 02:00:00 2) 04:00:00 & 06:00:00
t_value
2019-11-24 00:00:00 4.0
2019-11-24 01:00:00 7.8
2019-11-24 02:00:00 95.1
2019-11-24 03:00:00 78.4
2019-11-24 04:00:00 8.0
2019-11-24 05:00:00 17.50
2019-11-24 06:00:00 55.00
2019-11-24 07:00:00 66.00
2019-11-25 00:00:00 21.00
2019-11-25 01:00:00 12.40
if-else & np.where are probable options but I'm unsure on how to implement the conditions on hours.

use between_time to get the datetimes between the specified times, then use loc to assign the new values :
I'll use #Ben.T's sample data :
df = pd.DataFrame({'t_value':range(1,11)},
index=pd.date_range('2020-05-17 00:00:00', periods=10, freq='1H'))
#get the time indices for the different ranges
m1 = df.between_time('00:00:00','02:00:00').index
m2 = df.between_time('04:00:00','06:00:00').index
#assign 0 to the t_value column matches :
df.loc[m1|m2] = 0
print(df)
t_value
2020-05-17 00:00:00 0
2020-05-17 01:00:00 0
2020-05-17 02:00:00 0
2020-05-17 03:00:00 4
2020-05-17 04:00:00 0
2020-05-17 05:00:00 0
2020-05-17 06:00:00 0
2020-05-17 07:00:00 8
2020-05-17 08:00:00 9
2020-05-17 09:00:00 10

you can acces the time from your datetime index with time and create mask depending on your condition. Then use loc and | to concatenate your mask as or.
#sample data
df = pd.DataFrame({'t_value':range(1,11)},
index=pd.date_range('2020-05-17 00:00:00', periods=10, freq='1H'))
# masks
m1 = ((df.index.time>=pd.to_datetime('00:00:00').time())
& (df.index.time<=pd.to_datetime('02:00:00').time()))
m2 = ((df.index.time>=pd.to_datetime('04:00:00').time())
& (df.index.time<=pd.to_datetime('06:00:00').time()))
#set the value to 0
df.loc[m1|m2, 't_value'] = 0
print (df)
t_value
2020-05-17 00:00:00 0
2020-05-17 01:00:00 0
2020-05-17 02:00:00 0
2020-05-17 03:00:00 4
2020-05-17 04:00:00 0
2020-05-17 05:00:00 0
2020-05-17 06:00:00 0
2020-05-17 07:00:00 8
2020-05-17 08:00:00 9
2020-05-17 09:00:00 10

Related

plotting graph of day from a years data

So I have a dataset that has electricity load over 24 hours:
Time_of_Day = loadData.groupby(loadData.index.hour).mean()
Time_of_Day
Time Load
2019-01-01 01:00:00 38.045
2019-01-01 02:00:00 30.675
2019-01-01 03:00:00 22.570
2019-01-01 04:00:00 22.153
2019-01-01 05:00:00 21.085
... ...
2019-12-31 20:00:00 65.565
2019-12-31 21:00:00 53.513
2019-12-31 22:00:00 49.096
2019-12-31 23:00:00 44.409
2020-01-01 00:00:00 45.744
how do I plot a random day(24hrs) from the 8760 hours please
With the following toy dataframe:
import pandas as pd
import random
df = pd.DataFrame({"Time": pd.date_range(start="1/1/2019", end="12/31/2019", freq="H")})
df["Load"] = [round(random.random() * 100, 2) for _ in range(df.shape[0])]
Time Load
0 2019-01-01 00:00:00 53.36
1 2019-01-01 01:00:00 34.20
2 2019-01-01 02:00:00 64.19
3 2019-01-01 03:00:00 89.18
4 2019-01-01 04:00:00 27.82
... ... ...
8732 2019-12-30 20:00:00 38.26
8733 2019-12-30 21:00:00 49.66
8734 2019-12-30 22:00:00 64.15
8735 2019-12-30 23:00:00 23.97
8736 2019-12-31 00:00:00 3.72
[8737 rows x 2 columns]
Here is one way to do it using choice function from Python standard library random module:
# In Jupyter cell
df[
(df["Time"].dt.month == random.choice(df["Time"].dt.month))
& (df["Time"].dt.day == random.choice(df["Time"].dt.day))
].plot(x="Time")
Output:

Using time and numerical value in conditional statements to create categorical column python

I'm trying to execute if statement using time and numerical value to make a new column categorical column
Condition - if time is between 05:00:00 and 19:00:00 and t_value > 0 & t_value <=13 then classify as "C" else "IC"
If time is not in the range then classify as NA
Example Input
t_value
2020-05-17 00:00:00 0
2020-05-17 01:00:00 0
2020-05-17 02:00:00 0
2020-05-17 03:00:00 0
2020-05-17 04:00:00 0
2020-05-17 05:00:00 0
2020-05-17 06:00:00 0
2020-05-17 07:00:00 8
2020-05-17 08:00:00 9
2020-05-17 09:00:00 10
2020-05-17 10:00:00 11
2020-05-17 11:00:00 12
I'm unsure of the approach to take in this regard
Expected Output
t_value C/IC
2020-05-17 00:00:00 0 NA
2020-05-17 01:00:00 0 NA
2020-05-17 02:00:00 0 NA
2020-05-17 03:00:00 0 NA
2020-05-17 04:00:00 0 NA
2020-05-17 05:00:00 0 IC
2020-05-17 06:00:00 0 IC
2020-05-17 07:00:00 8 C
2020-05-17 08:00:00 9 C
2020-05-17 09:00:00 10 C
2020-05-17 10:00:00 11 C
2020-05-17 11:00:00 12 C
#convert to datetime index
df.index = pd.to_datetime(df.index)
#get condition for time boundary
cond1 = df.between_time( '05:00:00', '19:00:00')
print(cond1.index)
DatetimeIndex(['2020-05-17 05:00:00', '2020-05-17 06:00:00',
'2020-05-17 07:00:00', '2020-05-17 08:00:00',
'2020-05-17 09:00:00', '2020-05-17 10:00:00',
'2020-05-17 11:00:00'],
dtype='datetime64[ns]', freq=None)
#get index to match the t_value conditions
#indices that match time boundary, but not t_value boundary
ic = cond1.loc[~(cond1.t_value.gt(0)) & (cond1.t_value.le(13))].index
#indices that match time boundary and t_value boundary
c = cond1.loc[(cond1.t_value.gt(0)) & (cond1.t_value.le(13))].index
#assign value
df.loc[c,'C/IC'] = "C"
df.loc[ic,'C/IC'] = "IC"
print(df)
t_value C/IC
2020-05-17 00:00:00 0 NaN
2020-05-17 01:00:00 0 NaN
2020-05-17 02:00:00 0 NaN
2020-05-17 03:00:00 0 NaN
2020-05-17 04:00:00 0 NaN
2020-05-17 05:00:00 0 IC
2020-05-17 06:00:00 0 IC
2020-05-17 07:00:00 8 C
2020-05-17 08:00:00 9 C
2020-05-17 09:00:00 10 C
2020-05-17 10:00:00 11 C
2020-05-17 11:00:00 12 C

How to fill the first date in the column?

I have a df:
dates values
2020-01-01 00:15:00 38.61487
2020-01-01 00:30:00 36.905204
2020-01-01 00:45:00 35.136584
2020-01-01 01:00:00 33.60378
2020-01-01 01:15:00 32.306791999999994
2020-01-01 01:30:00 31.304574
I am creating a new column named start as follows:
df = df.rename(columns={'dates': 'end'})
df['start']= df['end'].shift(1)
When I do this, I get the following:
end values start
2020-01-01 00:15:00 38.61487 NaT
2020-01-01 00:30:00 36.905204 2020-01-01 00:15:00
2020-01-01 00:45:00 35.136584 2020-01-01 00:30:00
2020-01-01 01:00:00 33.60378 2020-01-01 00:45:00
2020-01-01 01:15:00 32.306791999999994 2020-01-01 01:00:00
2020-01-01 01:30:00 31.304574 2020-01-01 01:15:00
I want to fill that NaT value with
2020-01-01 00:00:00
How can this be done?
Use Series.fillna with datetimes, e.g. by Timestamp:
df['start']= df['end'].shift().fillna(pd.Timestamp('2020-01-01'))
Or if pandas 0.24+ with fill_value parameter:
df['start']= df['end'].shift(fill_value=pd.Timestamp('2020-01-01'))
If all datetimes are regular, always difference 15 minutes is possible subtracting by offsets.DateOffset:
df['start']= df['end'] - pd.offsets.DateOffset(minutes=15)
print (df)
end values start
0 2020-01-01 00:15:00 38.614870 2020-01-01 00:00:00
1 2020-01-01 00:30:00 36.905204 2020-01-01 00:15:00
2 2020-01-01 00:45:00 35.136584 2020-01-01 00:30:00
3 2020-01-01 01:00:00 33.603780 2020-01-01 00:45:00
4 2020-01-01 01:15:00 32.306792 2020-01-01 01:00:00
5 2020-01-01 01:30:00 31.304574 2020-01-01 01:15:00
How about that?
df = pd.DataFrame(columns = ['end'])
df.loc[:, 'end'] = pd.date_range(start=pd.Timestamp(2019,1,1,0,15), end=pd.Timestamp(2019,1,2), freq='15min')
df.loc[:, 'start'] = df.loc[:, 'end'].shift(1)
delta = df.loc[df.index[3], 'end'] - df.loc[df.index[2], 'end']
df.loc[df.index[0], 'start'] = df.loc[df.index[1], 'start'] - delta
df
end start
0 2019-01-01 00:15:00 2019-01-01 00:00:00
1 2019-01-01 00:30:00 2019-01-01 00:15:00
2 2019-01-01 00:45:00 2019-01-01 00:30:00
3 2019-01-01 01:00:00 2019-01-01 00:45:00
4 2019-01-01 01:15:00 2019-01-01 01:00:00
... ... ...
91 2019-01-01 23:00:00 2019-01-01 22:45:00
92 2019-01-01 23:15:00 2019-01-01 23:00:00
93 2019-01-01 23:30:00 2019-01-01 23:15:00
94 2019-01-01 23:45:00 2019-01-01 23:30:00
95 2019-01-02 00:00:00 2019-01-01 23:45:00

Flagging list of datetimes within date ranges in pandas dataframe

I've looked around (eg.
Python - Locating the closest timestamp) but can't find anything on this.
I have a list of datetimes, and a dataframe containing 10k + rows, of start and end times (formatted as datetimes).
The dataframe is effectively listing parameters for runs of an instrument.
The list describes times from an alarm event.
The datetime list items are all within a row (i.e. between a start and end time) in the dataframe. Is there an easy way to locate the rows which would contain the timeframe within which the alarm time would be? (sorry for poor wording there!)
eg.
for i in alarms:
df.loc[(df.start_time < i) & (df.end_time > i), 'Flag'] = 'Alarm'
(this didn't work but shows my approach)
Example datasets
# making list of datetimes for the alarms
df = pd.DataFrame({'Alarms':["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})
df['Alarms'] = pd.to_datetime(df['Alarms'])
alarms = list(df.Alarms.unique())
# dataframe of runs containing start and end times
n=33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({ 'start_date': rng1, 'end_Date': rng2})
Herein a flag would go against line (well, index) 4, 13 and 21.
You can use pandas.IntervalIndex here:
# Create and set IntervalIndex
intervals = pd.IntervalIndex.from_arrays(df.start_date, df.end_Date)
df = df.set_index(intervals)
# Update using loc
df.loc[alarms, 'flag'] = 'alarm'
# Finally, reset_index
df = df.reset_index(drop=True)
[out]
start_date end_Date flag
0 2019-07-18 00:00:00 2019-07-18 03:00:00 NaN
1 2019-07-18 03:00:00 2019-07-18 06:00:00 NaN
2 2019-07-18 06:00:00 2019-07-18 09:00:00 NaN
3 2019-07-18 09:00:00 2019-07-18 12:00:00 NaN
4 2019-07-18 12:00:00 2019-07-18 15:00:00 alarm
5 2019-07-18 15:00:00 2019-07-18 18:00:00 NaN
6 2019-07-18 18:00:00 2019-07-18 21:00:00 NaN
7 2019-07-18 21:00:00 2019-07-19 00:00:00 NaN
8 2019-07-19 00:00:00 2019-07-19 03:00:00 NaN
9 2019-07-19 03:00:00 2019-07-19 06:00:00 NaN
10 2019-07-19 06:00:00 2019-07-19 09:00:00 NaN
11 2019-07-19 09:00:00 2019-07-19 12:00:00 NaN
12 2019-07-19 12:00:00 2019-07-19 15:00:00 NaN
13 2019-07-19 15:00:00 2019-07-19 18:00:00 alarm
14 2019-07-19 18:00:00 2019-07-19 21:00:00 NaN
15 2019-07-19 21:00:00 2019-07-20 00:00:00 NaN
16 2019-07-20 00:00:00 2019-07-20 03:00:00 NaN
17 2019-07-20 03:00:00 2019-07-20 06:00:00 NaN
18 2019-07-20 06:00:00 2019-07-20 09:00:00 NaN
19 2019-07-20 09:00:00 2019-07-20 12:00:00 NaN
20 2019-07-20 12:00:00 2019-07-20 15:00:00 NaN
21 2019-07-20 15:00:00 2019-07-20 18:00:00 alarm
22 2019-07-20 18:00:00 2019-07-20 21:00:00 NaN
23 2019-07-20 21:00:00 2019-07-21 00:00:00 NaN
24 2019-07-21 00:00:00 2019-07-21 03:00:00 NaN
25 2019-07-21 03:00:00 2019-07-21 06:00:00 NaN
26 2019-07-21 06:00:00 2019-07-21 09:00:00 NaN
27 2019-07-21 09:00:00 2019-07-21 12:00:00 NaN
28 2019-07-21 12:00:00 2019-07-21 15:00:00 NaN
29 2019-07-21 15:00:00 2019-07-21 18:00:00 NaN
30 2019-07-21 18:00:00 2019-07-21 21:00:00 NaN
31 2019-07-21 21:00:00 2019-07-22 00:00:00 NaN
32 2019-07-22 00:00:00 2019-07-22 03:00:00 NaN
you were calling your columns start_date and end_Date, but in your for you use start_time and end_time.
try this:
import pandas as pd
df = pd.DataFrame({'Alarms': ["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})
df['Alarms'] = pd.to_datetime(df['Alarms'])
alarms = list(df.Alarms.unique())
# dataframe of runs containing start and end times
n = 33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({'start_date': rng1, 'end_Date': rng2})
for i in alarms:
df.loc[(df.start_date < i) & (df.end_Date > i), 'Flag'] = 'Alarm'
print(df[df['Flag']=='Alarm']['Flag'])
Output:
4 Alarm
13 Alarm
21 Alarm
Name: Flag, dtype: object

how can i get conditonal hourly mean in pandas?

i have below dataframe. and i wanna make a hourly mean dataframe
condition that every hour just calculate mean value 00:15:00~00:45:00.
date/time are multi index.
aaa
date time
2017-01-01 00:00:00 146.88
00:15:00 143.28
00:30:00 143.28
00:45:00 141.12
01:00:00 134.64
01:15:00 132.48
01:30:00 136.80
01:45:00 138.24
02:00:00 131.76
02:15:00 131.04
02:30:00 134.64
02:45:00 139.68
03:00:00 136.08
03:15:00 132.48
03:30:00 132.48
03:45:00 139.68
04:00:00 134.64
04:15:00 131.04
04:30:00 160.56
04:45:00 177.12
...
results should be belows.. how can i do it?
aaa
date time
2017-01-01 00:00:00 146.88
01:00:00 134.64
02:00:00 131.76
03:00:00 136.08
04:00:00 134.64
...
It seems need only select rows with 00:00 in the end of times:
df2 = df1[df1.index.get_level_values(1).astype(str).str.endswith('00:00')]
print (df2)
aaa
date time
2017-01-01 00:00:00 146.88
01:00:00 134.64
02:00:00 131.76
03:00:00 136.08
04:00:00 134.64
But if need mean only values 00:15-00:45 it is more complicated:
lvl1 = pd.Series(df1.index.get_level_values(1))
m = ~lvl1.astype(str).str.endswith('00:00')
lvl1new = lvl1.mask(m).ffill()
df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0),
lvl1new.where(m)], names=df1.index.names)
print (df1)
aaa
date time
2017-01-01 NaN 146.88
00:00:00 143.28
00:00:00 143.28
00:00:00 141.12
NaN 134.64
01:00:00 132.48
01:00:00 136.80
01:00:00 138.24
NaN 131.76
02:00:00 131.04
02:00:00 134.64
02:00:00 139.68
NaN 136.08
03:00:00 132.48
03:00:00 132.48
03:00:00 139.68
NaN 134.64
04:00:00 131.04
04:00:00 160.56
04:00:00 177.12
df = df1['aaa'].groupby(level=[0,1]).mean()
print (df)
date time
2017-01-01 00:00:00 142.56
01:00:00 135.84
02:00:00 135.12
03:00:00 134.88
04:00:00 156.24
Name: aaa, dtype: float64

Categories

Resources