This is my code
file = pd.read_excel(open("file name",'rb'),sheetname="data")
max_vol = file["Voltage"].max()
max_time = file.loc["Voltage"]==max_vol,"Timestamp"]
My Timestamp has data like this
0 2018-03-01 00:00:00
1 2018-03-01 00:05:00
2 2018-03-01 00:10:00
3 2018-03-01 00:15:00
4 2018-03-01 00:20:00
5 2018-03-01 00:25:00
6 2018-03-01 00:30:00
7 2018-03-01 00:35:00
8 2018-03-01 00:40:00
9 2018-03-01 00:45:00
10 2018-03-01 00:50:00
11 2018-03-01 00:55:00
12 2018-03-01 01:00:00
13 2018-03-01 01:05:00
14 2018-03-01 01:10:00
15 2018-03-01 01:15:00
16 2018-03-01 01:20:00
When printing max_time, i am getting a result like
624 2018-03-03 04:00:00
Name: Timestamp, dtype: datetime64[ns]
but i want only
2018-03-03 04:00:00
can someone help me in this regard
You can use argmax to extract the index of the largest element, and then use pd.DataFrame.loc:
df['datetime'] = pd.to_datetime(df['datetime']) # convert to datetime
res = df['datetime'].loc[df['voltage'].argmax()]
If you know your index is an integer range beginning 0, e.g. [0, 1, 2], then you can equivalently use the more efficient .iat or .iloc accessors.
pd.Series.argmax returns the Index of first occurrence of maximum of values. pd.DataFrame.loc permits indexing by index label, so linking the two we reach the desired result.
Related
In the example dataframe below, how can I convert t_relative into hours? For example, the relative time in the first row would be 49 hours.
tstart tend t_relative
0 2131-05-16 23:00:00 2131-05-19 00:00:00 2 days 01:00:00
1 2131-05-16 23:00:00 2131-05-19 00:15:00 2 days 01:15:00
2 2131-05-16 23:00:00 2131-05-19 00:45:00 2 days 01:45:00
3 2131-05-16 23:00:00 2131-05-19 01:00:00 2 days 02:00:00
4 2131-05-16 23:00:00 2131-05-19 01:15:00 2 days 02:15:00
t_relative was calculated with the operation, df['t_relative'] = df['tend']-df['tstart'].
You can divide Timedelta:
df['t_relative']/pd.Timedelta('1H')
Output:
0 49.00
1 49.25
2 49.75
3 50.00
4 50.25
Name: t_relative, dtype: float64
I am trying to add rows to my pandas dataframe as such:
import pandas as pd
import datetime as dt
d={'datetime':[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}
df=pd.DataFrame(d)
Which outputs:
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:10:00 5.0
2 2018-03-01 00:40:00 1.0
What I want to do is add rows from 00:00:00 to 00:40:00, to show every 5 minutes. My desired output looks like this:
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:05:00 NaN
2 2018-03-01 00:10:00 5.0
3 2018-03-01 00:15:00 NaN
4 2018-03-01 00:20:00 NaN
5 2018-03-01 00:25:00 NaN
6 2018-03-01 00:30:00 NaN
7 2018-03-01 00:35:00 NaN
8 2018-03-01 00:40:00 1.0
How do I get there?
You can use pd.DataFrame.resample:
df = df.resample('5Min', on='datetime').first()\
.drop('datetime', 1).reset_index()
print(df)
datetime value
0 2018-03-01 00:00:00 4.0
1 2018-03-01 00:05:00 NaN
2 2018-03-01 00:10:00 5.0
3 2018-03-01 00:15:00 NaN
4 2018-03-01 00:20:00 NaN
5 2018-03-01 00:25:00 NaN
6 2018-03-01 00:30:00 NaN
7 2018-03-01 00:35:00 NaN
8 2018-03-01 00:40:00 1.0
First, you can create a dataframe including your final datetime index and then affect the second one :
df1 = pd.DataFrame({'value': np.nan} ,index=pd.date_range('2018-03-01 00:00:00',
periods=9, freq='5min'))
print(df)
#Output :
value
2018-03-01 00:00:00 NaN
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 NaN
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 NaN
Now, let's say your dataframe is the second one, you can add this to your above code :
d={'datetime':
[dt.datetime(2018,3,1,0,0),dt.datetime(2018,3,1,0,10),dt.datetime(2018,3,1,0,40)],
'value':[4.,5.,1.]}
df2=pd.DataFrame(d)
df2.datetime = pd.to_datetime(df2.datetime)
df2.set_index('datetime',inplace=True)
print(df2)
#Output
value
datetime
2018-03-01 00:00:00 4.0
2018-03-01 00:10:00 5.0
2018-03-01 00:40:00 1.0
Finally :
df1.value = df2.value
print(df1)
#output
value
2018-03-01 00:00:00 4.0
2018-03-01 00:05:00 NaN
2018-03-01 00:10:00 5.0
2018-03-01 00:15:00 NaN
2018-03-01 00:20:00 NaN
2018-03-01 00:25:00 NaN
2018-03-01 00:30:00 NaN
2018-03-01 00:35:00 NaN
2018-03-01 00:40:00 1.0
I have data:
id time w
0 39 2018-03-01 00:00:00 1176.000000
1 39 2018-03-01 01:45:00 1033.461538
2 39 2018-03-01 02:00:00 1081.066667
3 39 2018-03-01 02:15:00 1067.909091
4 39 2018-03-01 02:30:00 1026.600000
5 39 2018-03-01 02:45:00 1051.866667
I have groupby once every fifteen minutes from the original data.
But I want to present:
id time w
0 39 2018-03-01 00:00:00 1176.000000
1 39 2018-03-01 00:15:00 NaN
2 39 2018-03-01 00:30:00 NaN
. 39 ... ... ...
. 39 ... ... ...
. 39 2018-03-01 01:30:00 NaN
1 39 2018-03-01 01:45:00 1033.461538
2 39 2018-03-01 02:00:00 1081.066667
3 39 2018-03-01 02:15:00 1067.909091
4 39 2018-03-01 02:30:00 1026.600000
5 39 2018-03-01 02:45:00 1051.866667
I tried to use this but it was not work.
Like this:
showData = Data.groupby(['id', pd.Grouper(key='time',freq='15T')])
['w'].mean().replace('',np.nan).reset_index()
I really need your help.Many thanks.
Simply use resample:
df.resample('15min', on='time').mean()
id w
time
2018-03-01 00:00:00 39.0 1176.000000
2018-03-01 00:15:00 NaN NaN
2018-03-01 00:30:00 NaN NaN
2018-03-01 00:45:00 NaN NaN
2018-03-01 01:00:00 NaN NaN
2018-03-01 01:15:00 NaN NaN
2018-03-01 01:30:00 NaN NaN
2018-03-01 01:45:00 39.0 1033.461538
2018-03-01 02:00:00 39.0 1081.066667
2018-03-01 02:15:00 39.0 1067.909091
2018-03-01 02:30:00 39.0 1026.600000
2018-03-01 02:45:00 39.0 1051.866667
To fill in you id, you can just use fillna(method='ffill'):
resampled_df = df.resample('15T', on='time').mean()
resampled_df['id'].fillna(method='ffill', inplace=True)
resampled_df
id w
time
2018-03-01 00:00:00 39.0 1176.000000
2018-03-01 00:15:00 39.0 NaN
2018-03-01 00:30:00 39.0 NaN
2018-03-01 00:45:00 39.0 NaN
2018-03-01 01:00:00 39.0 NaN
2018-03-01 01:15:00 39.0 NaN
2018-03-01 01:30:00 39.0 NaN
2018-03-01 01:45:00 39.0 1033.461538
2018-03-01 02:00:00 39.0 1081.066667
2018-03-01 02:15:00 39.0 1067.909091
2018-03-01 02:30:00 39.0 1026.600000
2018-03-01 02:45:00 39.0 1051.866667
I have a DataFrame with data similar to the following
import pandas as pd; import numpy as np; import datetime; from datetime import timedelta;
df = pd.DataFrame(index=pd.date_range(start='20160102', end='20170301', freq='5min'))
df['value'] = np.random.randn(df.index.size)
df.index += pd.Series([timedelta(seconds=np.random.randint(-60, 60))
for _ in range(df.index.size)])
which looks like this
In[37]: df
Out[37]:
value
2016-01-02 00:00:33 0.546675
2016-01-02 00:04:52 1.080558
2016-01-02 00:10:46 -1.551206
2016-01-02 00:15:52 -1.278845
2016-01-02 00:19:04 -1.672387
2016-01-02 00:25:36 -0.786985
2016-01-02 00:29:35 1.067132
2016-01-02 00:34:36 -0.575365
2016-01-02 00:39:33 0.570341
2016-01-02 00:44:56 -0.636312
...
2017-02-28 23:14:57 -0.027981
2017-02-28 23:19:51 0.883150
2017-02-28 23:24:15 -0.706997
2017-02-28 23:30:09 -0.954630
2017-02-28 23:35:08 -1.184881
2017-02-28 23:40:20 0.104017
2017-02-28 23:44:10 -0.678742
2017-02-28 23:49:15 -0.959857
2017-02-28 23:54:36 -1.157165
2017-02-28 23:59:10 0.527642
Now, I'm aiming to get the mean per 5 minute period over the course of a 24 hour day - without considering what day those values actually come from.
How can I do this effectively? I would like to think I could somehow remove the actual dates from my index and then use something like pd.TimeGrouper, but I haven't figured out how to do so.
My not-so-great solution
My solution so far has been to use between_time in a loop like this, just using an arbitrary day.
aggregates = []
start_time = datetime.datetime(1990, 1, 1, 0, 0, 0)
while start_time < datetime.datetime(1990, 1, 1, 23, 59, 0):
aggregates.append(
(
start_time,
df.between_time(start_time.time(),
(start_time + timedelta(minutes=5)).time(),
include_end=False).value.mean()
)
)
start_time += timedelta(minutes=5)
result = pd.DataFrame(aggregates, columns=['time', 'value'])
which works as expected
In[68]: result
Out[68]:
time value
0 1990-01-01 00:00:00 0.032667
1 1990-01-01 00:05:00 0.117288
2 1990-01-01 00:10:00 -0.052447
3 1990-01-01 00:15:00 -0.070428
4 1990-01-01 00:20:00 0.034584
5 1990-01-01 00:25:00 0.042414
6 1990-01-01 00:30:00 0.043388
7 1990-01-01 00:35:00 0.050371
8 1990-01-01 00:40:00 0.022209
9 1990-01-01 00:45:00 -0.035161
.. ... ...
278 1990-01-01 23:10:00 0.073753
279 1990-01-01 23:15:00 -0.005661
280 1990-01-01 23:20:00 -0.074529
281 1990-01-01 23:25:00 -0.083190
282 1990-01-01 23:30:00 -0.036636
283 1990-01-01 23:35:00 0.006767
284 1990-01-01 23:40:00 0.043436
285 1990-01-01 23:45:00 0.011117
286 1990-01-01 23:50:00 0.020737
287 1990-01-01 23:55:00 0.021030
[288 rows x 2 columns]
But this doesn't feel like a very Pandas-friendly solution.
IIUC then the following should work:
In [62]:
df.groupby(df.index.floor('5min').time).mean()
Out[62]:
value
00:00:00 -0.038002
00:05:00 -0.011646
00:10:00 0.010701
00:15:00 0.034699
00:20:00 0.041164
00:25:00 0.151187
00:30:00 -0.006149
00:35:00 -0.008256
00:40:00 0.021389
00:45:00 0.016851
00:50:00 -0.074825
00:55:00 0.012861
01:00:00 0.054048
01:05:00 0.041907
01:10:00 -0.004457
01:15:00 0.052428
01:20:00 -0.021518
01:25:00 -0.019010
01:30:00 0.030887
01:35:00 -0.085415
01:40:00 0.002386
01:45:00 -0.002189
01:50:00 0.049720
01:55:00 0.032292
02:00:00 -0.043642
02:05:00 0.067132
02:10:00 -0.029628
02:15:00 0.064098
02:20:00 0.042731
02:25:00 -0.031113
... ...
21:30:00 -0.018391
21:35:00 0.032155
21:40:00 0.035014
21:45:00 -0.016979
21:50:00 -0.025248
21:55:00 0.027896
22:00:00 -0.117036
22:05:00 -0.017970
22:10:00 -0.008494
22:15:00 -0.065303
22:20:00 -0.014623
22:25:00 0.076994
22:30:00 -0.030935
22:35:00 0.030308
22:40:00 -0.124668
22:45:00 0.064853
22:50:00 0.057913
22:55:00 0.002309
23:00:00 0.083586
23:05:00 -0.031043
23:10:00 -0.049510
23:15:00 0.003520
23:20:00 0.037135
23:25:00 -0.002231
23:30:00 -0.029592
23:35:00 0.040335
23:40:00 -0.021513
23:45:00 0.104421
23:50:00 -0.022280
23:55:00 -0.021283
[288 rows x 1 columns]
Here I floor the index to '5 min' intervals and then group on the time attribute and aggregate the mean
My DataFrame is in the Form:
TimeWeek TimeSat TimeHoli
0 6:40:00 8:00:00 8:00:00
1 6:45:00 8:05:00 8:05:00
2 6:50:00 8:09:00 8:10:00
3 6:55:00 8:11:00 8:14:00
4 6:58:00 8:13:00 8:17:00
5 7:40:00 8:15:00 8:21:00
I need to find the time difference between each row in TimeWeek , TimeSat and TimeHoli, the output must be
TimeWeekDiff TimeSatDiff TimeHoliDiff
00:05:00 00:05:00 00:05:00
00:05:00 00:04:00 00:05:00
00:05:00 00:02:00 00:04:00
00:03:00 00:02:00 00:03:00
00:02:00 00:02:00 00:04:00
I tried using (d['TimeWeek']-df['TimeWeek'].shift().fillna(0) , it throws an error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Probably because of the presence of ':' in the column. How do I resolve this?
It looks like the error is thrown because the data is in the form of a string instead of a timestamp. First convert them to timestamps:
df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])
They will contain today's date by default, but this shouldn't matter once you difference the time (hopefully you don't have to worry about differencing 23:55 and 00:05 across dates).
Once converted, simply difference the DataFrame:
>>> df2 - df2.shift()
TimeWeek TimeSat TimeHoli
0 NaT NaT NaT
1 00:05:00 00:05:00 00:05:00
2 00:05:00 00:04:00 00:05:00
3 00:05:00 00:02:00 00:04:00
4 00:03:00 00:02:00 00:03:00
5 00:42:00 00:02:00 00:04:00
Depending on your needs, you can just take rows 1+ (ignoring the NaTs):
(df2 - df2.shift()).iloc[1:, :]
or you can fill the NaTs with zeros:
(df2 - df2.shift()).fillna(0)
Forget everything I just said. Pandas has great timedelta parsing.
df["TimeWeek"] = pd.to_timedelta(df["TimeWeek"])
(d['TimeWeek']-df['TimeWeek'].shift().fillna(pd.to_timedelta("00:00:00"))
>>> import pandas as pd
>>> df = pd.DataFrame({'TimeWeek': ['6:40:00', '6:45:00', '6:50:00', '6:55:00', '7:40:00']})
>>> df["TimeWeek_date"] = pd.to_datetime(df["TimeWeek"], format="%H:%M:%S")
>>> print df
TimeWeek TimeWeek_date
0 6:40:00 1900-01-01 06:40:00
1 6:45:00 1900-01-01 06:45:00
2 6:50:00 1900-01-01 06:50:00
3 6:55:00 1900-01-01 06:55:00
4 7:40:00 1900-01-01 07:40:00
>>> df['TimeWeekDiff'] = (df['TimeWeek_date'] - df['TimeWeek_date'].shift().fillna(pd.to_datetime("00:00:00", format="%H:%M:%S")))
>>> print df
TimeWeek TimeWeek_date TimeWeekDiff
0 6:40:00 1900-01-01 06:40:00 06:40:00
1 6:45:00 1900-01-01 06:45:00 00:05:00
2 6:50:00 1900-01-01 06:50:00 00:05:00
3 6:55:00 1900-01-01 06:55:00 00:05:00
4 7:40:00 1900-01-01 07:40:00 00:45:00