generate a plot based on time-delta value - python

I have data frame which captures the data sent from the server. the server data at least once in 5 minutes. if the server doesn't send data for more than 5 minutes that time till the data is sent again is considered as black-out. I want to visualize these blackouts in a graph. The data frame looks like
timestamp temperature
2019-06-03 14:16:31.149132 27.17
2019-06-03 14:21:34.732911 27.13
2019-06-03 14:37:20.437143 27.16
2019-06-03 14:42:15.516416 27.13
2019-06-03 14:51:26.167553 27.19
2019-06-03 14:56:31.244862 27.02
2019-06-03 15:07:30.519727 27.1
2019-06-03 15:12:57.319953 27.12
2019-06-03 15:17:56.256638 27.12
I have calculated the time difference between two time stamps and, marked blackout and calculated the blackout time.
code:
df['TimeDelta'] = df['timestamp'] - df['timestamp'].shift()
df['blackout'] = np.where(df['TimeDelta'] > datetime.timedelta(minutes = 5) , 1 , 0)
df['blackoutTime'] = np.where(df['blackout'] > 0, df['TimeDelta'] - datetime.timedelta(minutes = 5), 0)
df['blackoutMins'] = df['blackoutTime'] / np.timedelta64(1,'m')
which gives 4 additional columns
TimeDelta blackout blackoutIime blackoutMins
0 days 00:04:57.310512000 0 0 days 00:00:00.000000000 0.0
0 days 00:05:03.583779000 1 0 days 00:00:03.583779000 0.05972965
0 days 00:15:45.704232000 1 0 days 00:10:45.704232000 10.7617372
0 days 00:04:55.079273000 0 0 days 00:00:00.000000000 0.0
0 days 00:09:10.651137000 1 0 days 00:04:10.651137000 4.17751895
0 days 00:05:05.077309000 1 0 days 00:00:05.077309000 0.08462181666666667
0 days 00:10:59.274865000 1 0 days 00:05:59.274865000 5.9879144166666665
0 days 00:05:26.800226000 1 0 days 00:00:26.800226000 0.44667043333333334
0 days 00:04:58.936685000 0 0 days 00:00:00.000000000 0.0
0 days 00:05:16.684317000 1 0 days 00:00:16.684317000 0.27807195
0 days 00:05:02.304786000 1 0 days 00:00:02.304786000 0.0384131
So what i want i am trying is to visualize the blackouts with time on x-axis and blackout on the y-axis, i want something like
with x-axis being the time axis and y -axis showing the time for which its blackout. Can someone help with how to do this visualization.

You want plt.step against the original timestamp:
df['blackout'] = df.timestamp.diff().gt('5min').astype(int)
plt.step(df.timestamp, df.blackout, c='red')
Output:

Related

percentage difference of datetime object

I want to create a new column which contains the values of column diff(s) but in percentage.
Finish Time diff (s)
0 1900-01-01 00:42:43.500 0 days 00:00:00
1 1900-01-01 00:44:01.200 0 days 00:01:17
2 1900-01-01 00:44:06.500 0 days 00:01:23
3 1900-01-01 00:44:29.500 0 days 00:01:46
4 1900-01-01 00:44:47.500 0 days 00:02:04
to further understand the data:
df["diff(s)"] = df["Finish Time"] - min(df["Finish Time"])
Finish Time datetime64[ns]
diff (s) timedelta64[ns]
dtype: object
df["diff(%)"] = ((df["Finish Time"]/min(df["Finish
Time"]))*100)
-> results in this error
TypeError: cannot perform __truediv__ with this index type:
DatetimeArray
It depends how are defined percentages - if need divide by summed timedeltas:
df["diff(s)"] = df["Finish Time"] - df["Finish Time"].min()
df["diff(%)"] = (df["diff(s)"] / df["diff(s)"].sum()) * 100
print (df)
Finish Time diff(s) diff(%)
0 1900-01-01 00:42:43.500 0 days 00:00:00 0.000000
1 1900-01-01 00:44:01.200 0 days 00:01:17.700000 19.887382
2 1900-01-01 00:44:06.500 0 days 00:01:23 21.243921
3 1900-01-01 00:44:29.500 0 days 00:01:46 27.130791
4 1900-01-01 00:44:47.500 0 days 00:02:04 31.737906
Or using Series.pct_change:
df["diff(%)"] = df["diff(s)"].pct_change() * 100

Calculates a standard deviation columns for timedelta elements

I have the following dataframe in Python:
ID
country_ID
visit_time
0
ESP
10 days 12:03:00
0
ESP
5 days 02:03:00
0
ENG
5 days 10:02:00
1
ENG
3 days 08:05:03
1
ESP
1 days 03:02:00
1
ENG
2 days 07:01:03
2
ENG
0 days 12:01:02
For each ID I want to calculate the standard deviation of each country_ID group.
std_visit_ESP and std_visit_ENG columns.
standard deviation of visit time with country_ID = ESP for each ID.
standard deviation of visit time with country_ID = ENG for each ID.
ID
std_visit_ESP
std_visit_ENG
0
2 days 17:00:00
0 days 00:00:00
1
0 days 00:00:00
0 days 12:32:00
2
NaT
0 days 00:00:00
With the groupby method for the mean, you can specify the parameter numeric_only = False, but the std method of groupby does not include this option.
My idea is to convert the timedelta to seconds, calculate the standard deviation and then convert it back to timedelta. Here is an example:
td1 = timedelta(10,0,0,0,3,12,0).total_seconds()
td2 = timedelta(5,0,0,0,3,2,0).total_seconds()
arr = [td1,td2]
var = np.std(arr)
show_s = pd.to_timedelta(var, unit='s')
print(show_s)
I don't know how to use this with groupby to get the desired result. I am grateful for your help.
Use GroupBy.std and pd.to_timedelta
total_seconds = \
pd.to_timedelta(
df['visit_time'].dt.total_seconds()
.groupby([df['ID'], df['country_ID']]).std(),
unit='S').unstack().fillna(pd.Timedelta(days=0))
print(total_seconds)
country_ID ENG ESP
ID
0 0 days 00:00:00 3 days 19:55:25.973595304
1 0 days 17:43:29.315934274 0 days 00:00:00
2 0 days 00:00:00 0 days 00:00:00
If I understand correctly, this should work for you:
stddevs = df['visit_time'].dt.total_seconds().groupby([df['country_ID']]).std().apply(lambda x: pd.Timedelta(seconds=x))
Output:
>>> stddevs
country_ID
ENG 2 days 01:17:43.835702
ESP 4 days 16:40:16.598773
Name: visit_time, dtype: timedelta64[ns]
Formatting:
stddevs = df['visit_time'].dt.total_seconds().groupby([df['country_ID']]).std().apply(lambda x: pd.Timedelta(seconds=x)).to_frame().T.add_prefix('std_visit_').reset_index(drop=True).rename_axis(None, axis=1)
Output:
>>> stddevs
std_visit_ENG std_visit_ESP
0 2 days 01:17:43.835702 4 days 16:40:16.598773

Get the mean of timedelta column

I have a column made of timedelta elements in a dataframe:
time_to_return_ask
0 0 days 00:00:00.046000
1 0 days 00:00:00.204000
2 0 days 00:00:00.336000
3 0 days 00:00:00.362000
4 0 days 00:00:00.109000
...
3240 0 days 00:00:00.158000
3241 0 days 00:00:00.028000
3242 0 days 00:00:00.130000
3243 0 days 00:00:00.035000
3244 0
Name: time_to_return_ask, Length: 3245, dtype: object
I tried to apply the solution of another question, by taking the values of the different elements, but I am already stuck. Any idea? Thanks!
What I tried:
df['time_to_return_ask'].values.astype(np.int64)
means = dropped.groupby('ts').mean()
means['new'] = pd.to_timedelta(means['new'])

Creating Bin for timestamp column

I am trying to create a proper bin for a timestamp interval column,
using code such as
df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00']))
The Resulting df looks like:
time_interval | bin
00:17:00 (0 days 00:10:00, 0 days 00:20:00]
01:42:00 NaN
00:15:00 (0 days 00:10:00, 0 days 00:20:00]
00:00:00 NaN
00:06:00 (0 days 00:00:00, 0 days 00:10:00]
Which is a little off as the result I want is jjust the time value and not the days and also I want the upper limit or last bin to be 60 mins or inf ( or more)
Desired Output:
time_interval | bin
00:17:00 (00:10:00,00:20:00]
01:42:00 (00:60:00,inf]
00:15:00 (00:10:00,00:20:00]
00:00:00 (00:00:00,00:10:00]
00:06:00 (00:00:00,00:10:00]
Thanks for looking!
In pandas inf for timedeltas not exist, so used maximal value. Also for include lowest values is used parameter include_lowest=True if want bins filled by timedeltas:
b = pd.to_timedelta(['00:00:00','00:10:00','00:20:00',
'00:30:00','00:40:00',
'00:50:00','00:60:00'])
b = b.append(pd.Index([pd.Timedelta.max]))
df['Bin'] = pd.cut(df['time_interval'], include_lowest=True, bins=b)
print (df)
time_interval Bin
0 00:17:00 (0 days 00:10:00, 0 days 00:20:00]
1 01:42:00 (0 days 01:00:00, 106751 days 23:47:16.854775]
2 00:15:00 (0 days 00:10:00, 0 days 00:20:00]
3 00:00:00 (-1 days +23:59:59.999999, 0 days 00:10:00]
4 00:06:00 (-1 days +23:59:59.999999, 0 days 00:10:00]
If want strings instead timedeltas use zip for create labels with append 'inf':
vals = ['00:00:00','00:10:00','00:20:00',
'00:30:00','00:40:00', '00:50:00','00:60:00']
b = pd.to_timedelta(vals).append(pd.Index([pd.Timedelta.max]))
vals.append('inf')
labels = ['{}-{}'.format(i, j) for i, j in zip(vals[:-1], vals[1:])]
df['Bin'] = pd.cut(df['time_interval'], include_lowest=True, bins=b, labels=labels)
print (df)
time_interval Bin
0 00:17:00 00:10:00-00:20:00
1 01:42:00 00:60:00-inf
2 00:15:00 00:10:00-00:20:00
3 00:00:00 00:00:00-00:10:00
4 00:06:00 00:00:00-00:10:00
You could just use labels to solve it -
df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00', '24:00:00']), labels=['(00:00:00,00:10:00]', '(00:10:00,00:20:00]', '(00:20:00,00:30:00]', '(00:30:00,00:40:00]', '(00:40:00,00:50:00]', '(00:50:00,00:60:00]', '(00:60:00,inf]'])

Pandas duration groupby - Start group-range with defined value

I am trying to group a data set of travel duration with 5 minutes interval, starting from 0 to inf. How may I do that?
My sample dataFrame looks like:
Duration
0 00:01:37
1 00:18:19
2 00:22:03
3 00:41:07
4 00:11:54
5 00:21:34
I have used this code: df.groupby([pd.Grouper(key='Duration', freq='5T')]).size()
And I have found following result:
Duration
00:01:37 1
00:06:37 0
00:11:37 1
00:16:37 2
00:21:37 1
00:26:37 0
00:31:37 0
00:36:37 1
00:41:37 0
Freq: 5T, dtype: int64
My expected result is:
Duration Counts
00:00:00 0
00:05:00 1
00:10:00 0
00:15:00 1
00:20:00 1
........ ...
My expectation is the index will start from 00:00:00 instead of 00:01:37.
Or, showing bins will also work for me, I mean:
Duration Counts
0-5 1
5-10 0
10-15 1
15-20 1
20-25 2
........ ...
I need your help please. Thank you.
First, you need to roud off your time to lower 5th minute. Then simply count it.
I suppose this is what you are looking for -
def round_to_5min(t):
""" This function rounds a timedelta timestamp to the nearest 5-min mark"""
t = datetime.datetime(1991,2,13, t.hour, t.minute - t.minute%5, 0)
return t
data['new_col'] = data.Duration.map(round_to_5min).dt.time

Categories

Resources