Suppose I'm given a pandas dataframe that is indexed in timedeltas64[ns].
A B C D E
0 days 00:00:00 0.642973 -0.041259 253.377516 0.0
0 days 00:15:00 0.647493 -0.041230 253.309167 0.0
0 days 00:30:00 0.723258 -0.063110 253.416138 0.0
0 days 00:45:00 0.739604 -0.070342 253.305809 0.0
0 days 01:00:00 0.643327 -0.041131 252.967084 0.0
... ... ... ... ...
364 days 22:45:00 0.650392 -0.064805 249.658052 0.0
364 days 23:00:00 0.652765 -0.064821 249.243891 0.0
364 days 23:15:00 0.607198 -0.103190 249.553821 0.0
364 days 23:30:00 0.597602 -0.107975 249.687942 0.0
364 days 23:45:00 0.595224 -0.110376 250.059530 0.0
There does not appear to be any "permitted" way of converting the index to datetimes. Basic operations to convert the index such as:
df.index = pd.DatetimeIndex(df.index)
Or:
test_df.time = pd.to_datetime(test_df.index,format='%Y%m%d%H%M')
Both yield:
TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns]
Is there any permitted way to do this operation other than completely reformatting all of these (very numerous) datasets manually? The data is yearly with 15 minute intervals.
Your issue is that you cannot convert a timedelta object to a datetime object because the former is the difference between two datetimes. Based on your question it sounds like all these deltas are from the same base time, so you would need to add that in. Example usages below
In [1]: import datetime
In [2]: now = datetime.datetime.now()
In [3]: delta = datetime.timedelta(minutes=5)
In [4]: print(now, delta + now)
2021-02-22 20:14:37.273444 2021-02-22 20:19:37.273444
You can see in the above that the second print datetime is 5 minutes after the now object
Related
I have two dataframes, df_rates and df_profit as shown below. df_rates has a time-date value its column name with values as certain rates and the index values denotes the minutes before the time-date value of the column. (i.e. row 1 denotes 0 mins before 2012-03-31 23:45:00, row 2 denotes 5 mins before 2012-03-31 23:45:00 and so on). Whereas df_profit has timestamps as its index and there is a Profit column.
To achieve a new data frame based on certain conditions, I implemented the below code, but I am getting the following error "TypeError: can only concatenate str (not "int") to str" . I do not understand the occurrence of this error since there is no string. Can someone please help with this?
df_rates
Mins before time 2012-03-31 23:45:00
0 113.1
5 112.1
10 113.1
15 113.17
20 103.17
25 133.17
30 101.39
df_profit
Profit
2012-04-01 00:30:00 251.71
2012-04-01 00:15:00 652.782
2012-04-01 00:00:00 458.099
2012-03-31 23:45:00 3504.664
2012-03-31 23:30:00 1215.76
2012-03-31 23:15:00 -21.48
2012-03-31 23:00:00 -8.538
2012-03-31 22:40:00 -5.11
Code : 'anchor_time' is of type <class 'pandas._libs.tslibs.timestamps.Timestamp'> , lookback_minutes is of type <class 'int'>,
anchor_time = df_rates.columns[-1]
lookback_minutes = 30
df_rates = ( df_rates
.set_index(anchor_time - pd.to_timedelta(df_rates['Mins before time'] + lookback_minutes, unit='min'))
.join(df_profit).reset_index(drop=True))
Based on your code I think you wanted to do something like this, I don't know if this is your expected end result but it will allow you to keep going.
Note that the error was trying to manipulate strings as time objects.
import datetime
import pandas as pd
lookback_minutes = datetime.timedelta(minutes=30)
anchor_time = df_rates.columns[-1]
df_profit.index = pd.to_datetime(df_profit.index)
df_rates = df_rates.set_index(pd.to_datetime(anchor_time) - (pd.to_timedelta(df_rates["Mins before time"],"minutes")) + lookback_minutes)
df_merged = df_rates.join(df_profit)
df_merged.index.names = ['Datetime']
df_merged
Output:
Datetime
Mins before time
2012-03-31 23:45:00
Profit
2012-04-01 00:15:00
0
113.10
652.782
2012-04-01 00:10:00
5
112.10
NaN
2012-04-01 00:05:00
10
113.10
NaN
2012-04-01 00:00:00
15
113.17
458.099
2012-03-31 23:55:00
20
103.17
NaN
2012-03-31 23:50:00
25
133.17
NaN
2012-03-31 23:45:00
30
101.39
3504.664
I have pandas column with only timestamps in incremental order.
I use to_datetime() to work with that column but it automatically adds same day throughout column without incrementing when encounters midnight.
So how can I logically tell it to increment day when it crosses midnight.
rail[8].iloc[121]
rail[8].iloc[100]
printing these values outputs:
TIME 2020-11-19 00:18:00
Name: DSG, dtype: datetime64[ns]
TIME 2020-11-19 21:12:27
Name: KG, dtype: datetime64[ns]
whereas iloc[121] should be 2020-11-20
Sample data is like:
df1.columns = df1.iloc[0]
ids = df1.loc['TRAIN NO'].unique()
df1.drop('TRAIN NO',axis=0,inplace=True)
rail = {}
for i in range(len(ids)):
rail[i] = df1.filter(like=ids[i])
rail[i] = rail[i].reset_index()
rail[i].rename(columns={0:'TRAIN NO'},inplace=True)
rail[i] = pd.melt(rail[i],id_vars='TRAIN NO',value_name='TIME',var_name='trainId')
rail[i].drop(columns='trainId',inplace=True)
rail[i].rename(columns={'TRAIN NO': 'CheckPoints'},inplace=True)
rail[i].set_index('CheckPoints',inplace=True)
rail[i].dropna(inplace=True)
rail[i]['TIME'] = pd.to_datetime(rail[i]['TIME'],infer_datetime_format=True)
CheckPoints TIME
DEPOT 2020-11-19 05:10:00
KG 2020-11-19 05:25:00
RI 2020-11-19 05:51:11
RI 2020-11-19 06:00:00
KG 2020-11-19 06:25:44
... ...
DSG 2020-11-19 23:41:50
ATHA 2020-11-19 23:53:56
NBAA 2020-11-19 23:58:00
NBAA 2020-11-19 00:01:00
DSG 2020-11-19 00:18:00
Could someone help me out..!
You can check where the timedelta of subsequent timestamps is less than 0 (= date changes). Use the cumsum of that and add it as a timedelta (days) to your datetime column:
import pandas as pd
df = pd.DataFrame({'time': ["23:00", "00:00", "12:00", "23:00", "01:00"]})
# cast time string to datetime, will automatically add today's date by default
df['datetime'] = pd.to_datetime(df['time'])
# get timedelta between subsequent timestamps in the column; df['datetime'].diff()
# compare to get a boolean mask where the change in time is negative (= new date)
m = df['datetime'].diff() < pd.Timedelta(0)
# m
# 0 False
# 1 True
# 2 False
# 3 False
# 4 True
# Name: datetime, dtype: bool
# the cumulated sum of that mask accumulates the booleans as 0/1:
# m.cumsum()
# 0 0
# 1 1
# 2 1
# 3 1
# 4 2
# Name: datetime, dtype: int32
# ...so we can use that as the date offset, which we add as timedelta to the datetime column:
df['datetime'] += pd.to_timedelta(m.cumsum(), unit='d')
df
time datetime
0 23:00 2020-11-19 23:00:00
1 00:00 2020-11-20 00:00:00
2 12:00 2020-11-20 12:00:00
3 23:00 2020-11-20 23:00:00
4 01:00 2020-11-21 01:00:00
I would like to make a subtraction with date_time in pandas python but with a shift of two rows, I don't know the function
Timestamp
2020-11-26 20:00:00
2020-11-26 21:00:00
2020-11-26 22:00:00
2020-11-26 23:30:00
Explanation:
(2020-11-26 21:00:00) - (2020-11-26 20:00:00)
(2020-11-26 23:30:00) - (2020-11-26 22:00:00)
The result must be:
01:00:00
01:30:00
Firstly you need to check if this is as type datetime.
If not, kindly do pd.to_datetime()
demo = pd.DataFrame(columns=['Timestamps'])
demotime = ['20:00:00','21:00:00','22:00:00','23:30:00']
demo['Timestamps'] = demotime
demo['Timestamps'] = pd.to_datetime(demo['Timestamps'])
Your dataframe would look like:
Timestamps
0 2020-11-29 20:00:00
1 2020-11-29 21:00:00
2 2020-11-29 22:00:00
3 2020-11-29 23:30:00
After that you can either use for loop or while and in that just do:
demo.iloc[i+1,0]-demo.iloc[i,0]
IIUC, you want to iterate on chunks of two and find the difference, one approach is to:
res = df.groupby(np.arange(len(df)) // 2).diff().dropna()
print(res)
Output
Timestamp
1 0 days 01:00:00
3 0 days 01:30:00
I have two high frequency time series of 3 months worth of data.
The problem is that one goes from 15:30 to 23:00, the other from 01:00 to 00:00.
IS there any way to match the two time series, by discarding the extra data, in order to run some regression analysis?
use can use the function combine_first of pandas Series. This function selects the element of the calling object, if both series contain the same index.
Following code shows a minimum example:
idx1 = pd.date_range('2018-01-01', periods=5, freq='H')
idx2 = pd.date_range('2018-01-01 01:00', periods=5, freq='H')
ts1 = pd.Series(range(len(ts1)), index=idx1)
ts2 = pd.Series(range(len(ts2)), index=idx2)
idx1.combine_first(idx2)
This gives a dataframe with the content:
2018-01-01 00:00:00 0.0
2018-01-01 01:00:00 1.0
2018-01-01 02:00:00 2.0
2018-01-01 03:00:00 3.0
2018-01-01 04:00:00 4.0
2018-01-01 05:00:00 4.0
For more complex combinations you can use combine.
I have two columns; the time an event started and the duration of that event. Like so:
time, duration
1:22:51,41
1:56:29,36
2:02:06,12
2:32:37,38
2:34:51,24
3:24:07,31
3:28:47,59
3:31:19,32
3:42:52,37
3:57:04,58
4:21:55,23
4:40:28,17
4:52:39,51
4:54:48,26
5:17:06,46
6:08:12,1
6:21:34,12
6:22:48,24
7:04:22,1
7:06:28,46
7:19:12,51
7:19:19,4
7:22:27,27
7:32:25,53
I want to create a line chart that shows the number of concurrent events happening throughout the day. Renaming time to start_time and adding a new column that computes the end_time is easy enough (assuming that's the next step) -- what I'm not quite sure I understand is how, afterwards, I can resample this data so I can chart concurrents.
I imagine I want to wind up with something like (but bucketed by the minute):
time, events
1:30:00,1
2:00:00,2
2:30:00,1
3:00:00,1
3:30:00,2
First make it an actual time stamp:
df['time'] = pd.to_datetime('2014-03-14 ' + df['time'])
Now you can get the end times:
df['end_time'] = df['time'] + df['duration'] * pd.offsets.Minute(1)
A way to get the open events is to combine the start and end times, resample and cumsum:
In [11]: open = pd.concat([pd.Series(1, df.time), # created add 1
pd.Series(-1, df.end_time) # closed substract 1
]).resample('30Min', how='sum').cumsum()
In [12]: open
Out[12]:
2014-03-14 01:00:00 1
2014-03-14 01:30:00 2
2014-03-14 02:00:00 1
2014-03-14 02:30:00 1
2014-03-14 03:00:00 2
2014-03-14 03:30:00 4
2014-03-14 04:00:00 2
2014-03-14 04:30:00 2
2014-03-14 05:00:00 2
2014-03-14 05:30:00 1
2014-03-14 06:00:00 2
2014-03-14 06:30:00 0
2014-03-14 07:00:00 3
2014-03-14 07:30:00 2
2014-03-14 08:00:00 0
Freq: 30T, dtype: int64
You could create a list containing dictionary items with values "time", "events"
obviously you need to handle the evaluating and manipulating of time data types differently, but you could do something like this:
event_bucket = []
time_interval = (end_time - start_time) / num_of_buckets
for ii in range(num_of_buckets):
event_bucket.append({"time":start_time + ii*time_interval,"events":0})
for entry in time_entry:
for bucket in event_bucket:
if bucket["time"] >= entry["start_time"] and bucket["time"] <= entry["end_time"]:
bucket["events"] += 1
If you make num_of_buckets larger you make the graph more precise.