Pandas Add one day after midnight - python

I am trying to add one day after midnight.
For example, I have a column typed datetime64 in Dataframe, Pandas.
Originally, my csv file only has time like 12:13:00, 07:12:53, 02:33:27.
I wanted to add a date into the time cuz the file name has a date. The thing is that I have to add one day on time after midnight.
Here's an example.
This is original data with the file name mycsv_20180101.csv
time
22:00:00
23:00:00
03:00:00
This is what I want.
time
2018-01-01 22:00:00
2018-01-01 23:00:00
2018-01-02 03:00:00 # this is the point.
Is there any idea for it?
I've thought about it for a while and my idea is
firstly, add a date.
Secondly, df['time'].apply(lambda x: x + pd.to_timedelta('1d') if x.dt.hour < 6 else False) # before 6am, I assume that that's a next day
but it says 'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I don't know why...
Thank you for your help in advance.

Suppose your dataframe and date from file are like these:
df = pd.DataFrame({'time': ["18:10:0","19:10:00","20:10:00","21:10:00","22:10:00","23:10:00","00:10:00","01:10:00","02:10:00","03:10:00"]})
file_date = '20180101'
You first need to add file_date to your data
df.time = df.time.apply(lambda x: ' '.join((file_date, x)))
which yields:
time
0 20180101 18:10:00
1 20180101 19:10:00
2 20180101 20:10:00
3 20180101 21:10:00
4 20180101 22:10:00
5 20180101 23:10:00
6 20180101 00:10:00
7 20180101 01:10:00
8 20180101 02:10:00
9 20180101 03:10:00
What you need to do is convert them into datetime type and add a day if hour is smaller than 4.
df.time = pd.to_datetime(df.time).apply(lambda x: x + pd.DateOffset(days=1) if x.hour <=3 else x)
which gives your desired output of:
time
0 2018-01-01 18:10:00
1 2018-01-01 19:10:00
2 2018-01-01 20:10:00
3 2018-01-01 21:10:00
4 2018-01-01 22:10:00
5 2018-01-01 23:10:00
6 2018-01-02 00:10:00
7 2018-01-02 01:10:00
8 2018-01-02 02:10:00
9 2018-01-02 03:10:00

Related

how to add values to specific date in pandas?

So I have a dataset with a specific date along with every data. I want to fill these values according to their specific date in Excel which contains the date range of the whole year. It's like the date starts from 01-01-2020 00:00:00 and end at 31-12-2020 23:45:00 with the frequency of 15 mins. So there will be a total of 35040 date-time values in Excel.
my data is like:
load date
12 01-02-2020 06:30:00
21 29-04-2020 03:45:00
23 02-07-2020 12:15:00
54 07-08-2020 16:00:00
23 22-09-2020 16:30:00
As you can see these values are not continuous but they have specific dates with them, so I these date values as the index and put it at that particular date in the Excel which has the date column, and also put zero in the missing values. Can someone please help?
Use DataFrame.reindex with date_range - so added 0 values for all not exist datetimes:
rng = pd.date_range('2020-01-01','2020-12-31 23:45:00', freq='15Min')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').reindex(rng, fill_value=0)
print (df)
load
2020-01-01 00:00:00 0
2020-01-01 00:15:00 0
2020-01-01 00:30:00 0
2020-01-01 00:45:00 0
2020-01-01 01:00:00 0
...
2020-12-31 22:45:00 0
2020-12-31 23:00:00 0
2020-12-31 23:15:00 0
2020-12-31 23:30:00 0
2020-12-31 23:45:00 0
[35136 rows x 1 columns]

Compare two dataframes and keep a specific datetime range of another

I have two dataframes with timestamps. I want to select the timestamps from df1 that equal the timestamps 'start_show' of df2 but also keep all the timestamps of df1 2 hours before and 2 hours after (of df1) where the timestamps are equal.
df1:
van_timestamp weekdag
2880 2016-11-19 00:00:00 6
2881 2016-11-19 00:15:00 6
2882 2016-11-19 00:30:00 6
... ... ...
822349 2019-11-06 22:45:00 3
822350 2019-11-06 23:00:00 3
822351 2019-11-06 23:15:00 3
df2:
einde_show start_show
255 2016-01-16 22:00:00 2016-01-16 20:00:00
256 2016-01-23 21:30:00 2016-01-23 19:45:00
257 2016-01-26 21:30:00 2016-01-26 19:45:00
... ... ...
1111 2019-12-29 18:30:00 2019-12-29 17:00:00
1112 2019-12-30 15:00:00 2019-12-30 13:30:00
1113 2019-12-30 18:30:00 2019-12-30 17:00:00
df1 contains a timestamp every 15 minutes of every day whereas df2['start_show'] contains just a single timestamp per day.
So ultimately what I want to achieve is that for every timestamp of df2 I have the corresponding timestamp of df1 +- 2 hours.
So far I've tried:
df1['van_timestamp'][df1['van_timestamp'].isin(df2['start_show'])]
This selects the right timestamps. Now I want to select everything from df1 in the range of
+ pd.Timedelta(2, unit='h')
- pd.Timedelta(2, unit='h')
But I'm not sure how to go about this. Help would be much appreciated!
Thanks!
I got it working (ugly fix). I created a datetime range
dates = [pd.date_range(start = df2['start_show'].iloc[i] - pd.Timedelta(2, unit='h'), end = df2['start_show'].iloc[i], freq = '15T') for i in range(len(evs_data))]
Which I then unlisted:
dates = [i for sublist in dates for i in sublist]
Afterwards I compared the dataframe with this list.
relevant_timestamps = df1[df1['van_timestamp'].isin(dates)]
If anyone else has a better solution, please let me know!

How to find the datetime difference between rows in a column, based on the condition?

I have the following pandas DataFrame df:
date time val1
2018-12-31 09:00:00 15
2018-12-31 10:00:00 22
2018-12-31 11:00:00 19
2018-12-31 11:30:00 10
2018-12-31 11:45:00 5
2018-12-31 12:00:00 1
2018-12-31 12:05:00 6
I want to find how many minutes are between the val1 value that is greater than 20 and the val1 value that is lower than or equal to 5?
In this example, the answer is 1 hour and 45 minutes = 95 minutes.
I know how to check the difference between two datetime values:
(df.from_datetime-df.to_datetime).astype('timedelta64[m]')
But how to slice it over the DataFrame, detecting the proper rows?
UPDATE: Taking into consideration that date might be different
Convert the date column to a datetime object and time column to a timedelta object and combine them to get another datetime object
df.time = pd.to_timedelta(df.time)
df.date = pd.to_datetime(df.date)
df['date_time'] = df['date'] + df['time']
df
date time val1 date_time
0 2018-12-31 09:00:00 15 2018-12-31 09:00:00
1 2018-12-31 10:00:00 22 2018-12-31 10:00:00
2 2018-12-31 11:00:00 19 2018-12-31 11:00:00
3 2018-12-31 11:30:00 10 2018-12-31 11:30:00
4 2018-12-31 11:45:00 5 2018-12-31 11:45:00
5 2018-12-31 12:00:00 1 2018-12-31 12:00:00
6 2018-12-31 12:05:00 6 2018-12-31 12:05:00
Now could use one of these two methods
1) Love lambdas and this works with Series objects.
subtr = lambda d1, d2: abs(d1 - d2)/np.timedelta64(1, 'm')
d20 = df[df.val1 > 20].date_time.iloc[0]
d5 = df[df.val1 <= 5].date_time.iloc[0]
subtr(d20, d5)
105.0
2) Needs DataFrame object instead of Series object. Hinders with my aesthetics
d20 = df[df.val1 <= 5][['date_time']].iloc[0]
d5 = df[df.val1 > 20][['date_time']].iloc[0]
abs(d5 - d20).astype('timedelta64[m]')[0]
105.0
So this is my approach:
1) Filter out any val1 that is not >= 20 or <= 5
df = pd.DataFrame({'date':['2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31','2018-12-31'],
'time':['09:00:00', '10:00:00', '11:00:00', '11:30:00', '11:45:00', '12:00:00', '12:05:00'],
'val1': [15,22,19,10,5,1,6]})
df2 = df[(df['val1'] >= 20)|(df['val1'] <= 5)].copy()
Then we will do the following code:
df2['TimeDiff'] = np.where(df2['val1'] - df2['val1'].shift(-1) >= 15,
df2['time'].astype('datetime64[ns]').shift(-1) - df2['time'].astype('datetime64[ns]'),
np.NaN)
Let me go through this.
np.where is a if statement, where if the first statment is true it will do the second, if not true then the third.
df2['val1'] - df2['val1'].shift(-1) >= 15 Since we filtered the df the minimum difference between two rows must be great than or equal to 15.
If it is true:
df2['time'].astype('datetime64[ns]').shift(-1) - df2['time'].astype('datetime64[ns]') We take the later time and subtract it from the beginning time.
If not true, we just return np.NaN
We get a df that looks like the following:
date time val1 TimeDiff
1 2018-12-31 10:00:00 22 01:45:00
4 2018-12-31 11:45:00 5 NaT
5 2018-12-31 12:00:00 1 NaT
If you want to put the TimeDiff on the end time you can do the following:
df2['TimeDiff'] = np.where(df2['val1'] - df2['val1'].shift(1) <= -15,
df2['time'].astype('datetime64[ns]') - df2['time'].astype('datetime64[ns]').shift(),
np.NaN)
and you will get:
date time val1 TimeDiff
1 2018-12-31 10:00:00 22 NaT
4 2018-12-31 11:45:00 5 01:45:00
5 2018-12-31 12:00:00 1 NaT

Convert and order time in a pandas df

I am trying to order timestamps in a pandas df. The times begin around 08:00:00 am and finish around 3:00:00 am. I'd like to add 24hrs to times after midnight. So times read 08:00:00 to 27:00:00 am. The problem is the times aren't ordered.
Example:
import pandas as pd
d = ({
'time' : ['08:00:00 am','12:00:00 pm','16:00:00 pm','20:00:00 pm','2:00:00 am','13:00:00 pm','3:00:00 am'],
})
df = pd.DataFrame(data=d)
If I try order the times via
df = pd.DataFrame(data=d)
df['time'] = pd.to_timedelta(df['time'])
df = df.sort_values(by='time',ascending=True)
Out:
time
4 02:00:00
6 03:00:00
0 08:00:00
1 12:00:00
5 13:00:00
2 16:00:00
3 20:00:00
Whereas I'm hoping the output is:
time
0 08:00:00
1 12:00:00
2 13:00:00
3 16:00:00
4 20:00:00
5 26:00:00
6 27:00:00
I'm not sure if this can be done though. Specifically, if I can differentiate between 8:00:00 am and the times after midnight (1am-3am).
Add a day offset for times after midnight and before when a new "day" is supposed to begin (pick some time after 3 am & before 7 am) & then sort values
cutoff, day = pd.to_timedelta(['3.5H', '24H'])
df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True)
# Out:
0 0 days 08:00:00
1 0 days 12:00:00
2 0 days 13:00:00
3 0 days 16:00:00
4 0 days 20:00:00
5 1 days 02:00:00
6 1 days 03:00:00
The last two values are numerically equal to 26 hours & 27 hours, just displayed differently.
If you need them in HH:MM:SS format, use string-formatting with the appropriate timedelta components
Ex:
x = df.time.apply(lambda x: x if x > cutoff else x + day).sort_values().reset_index(drop=True).dt.components
x.apply(lambda x: '{:02d}:{:02d}:{:02d}'.format(x.days*24+x.hours, x.minutes, x.seconds), axis=1)
#Out:
0 08:00:00
1 12:00:00
2 13:00:00
3 16:00:00
4 20:00:00
5 26:00:00
6 27:00:00
dtype: object

Get weekday/day-of-week for Datetime column of DataFrame

I have a DataFrame df like the following (excerpt, 'Timestamp' are the index):
Timestamp Value
2012-06-01 00:00:00 100
2012-06-01 00:15:00 150
2012-06-01 00:30:00 120
2012-06-01 01:00:00 220
2012-06-01 01:15:00 80
...and so on.
I need a new column df['weekday'] with the respective weekday/day-of-week of the timestamps.
How can I get this?
Use the new dt.dayofweek property:
In [2]:
df['weekday'] = df['Timestamp'].dt.dayofweek
df
Out[2]:
Timestamp Value weekday
0 2012-06-01 00:00:00 100 4
1 2012-06-01 00:15:00 150 4
2 2012-06-01 00:30:00 120 4
3 2012-06-01 01:00:00 220 4
4 2012-06-01 01:15:00 80 4
In the situation where the Timestamp is your index you need to reset the index and then call the dt.dayofweek property:
In [14]:
df = df.reset_index()
df['weekday'] = df['Timestamp'].dt.dayofweek
df
Out[14]:
Timestamp Value weekday
0 2012-06-01 00:00:00 100 4
1 2012-06-01 00:15:00 150 4
2 2012-06-01 00:30:00 120 4
3 2012-06-01 01:00:00 220 4
4 2012-06-01 01:15:00 80 4
Strangely if you try to create a series from the index in order to not reset the index you get NaN values as does using the result of reset_index to call the dt.dayofweek property without assigning the result of reset_index back to the original df:
In [16]:
df['weekday'] = pd.Series(df.index).dt.dayofweek
df
Out[16]:
Value weekday
Timestamp
2012-06-01 00:00:00 100 NaN
2012-06-01 00:15:00 150 NaN
2012-06-01 00:30:00 120 NaN
2012-06-01 01:00:00 220 NaN
2012-06-01 01:15:00 80 NaN
In [17]:
df['weekday'] = df.reset_index()['Timestamp'].dt.dayofweek
df
Out[17]:
Value weekday
Timestamp
2012-06-01 00:00:00 100 NaN
2012-06-01 00:15:00 150 NaN
2012-06-01 00:30:00 120 NaN
2012-06-01 01:00:00 220 NaN
2012-06-01 01:15:00 80 NaN
EDIT
As pointed out to me by user #joris you can just access the weekday attribute of the index so the following will work and is more compact:
df['Weekday'] = df.index.weekday
If the Timestamp column is a datetime value, then you can just use:
df['weekday'] = df['Timestamp'].apply(lambda x: x.weekday())
or
df['weekday'] = pd.to_datetime(df['Timestamp']).apply(lambda x: x.weekday())
You can get with this way:
import datetime
df['weekday'] = pd.Series(df.index).dt.day_name()
In case somebody else has the same issue with a multiindexed dataframe, here is what solved it for me, based on #joris solution:
df['Weekday'] = df.index.get_level_values(1).weekday
for me date was the get_level_values(1) instead of get_level_values(0), which would work for the outer index.
As of pandas 1.1.0 dt.dayofweek is deprecated, so instead of:
df['weekday'] = df['Timestamp'].dt.dayofweek
from #EdChum and #Artyom Krivolapov
you can now use:
df['weekday'] = df['Timestamp'].dt.isocalendar().day

Categories

Resources