Adding a datetime column in pandas dataframe from minute values - python

I have a data frame where there is time columns having minutes from 0-1339 meaning 1440 minutes of a day. I want to add a column datetime representing the day 2021-3-21 including hh amd mm like this 1980-03-01 11:00 I tried following code
from datetime import datetime, timedelta
date = datetime.date(2021, 3, 21)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + pd.to_timedelta(days, unit='d')
But the error seems like descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
Is there any other way to solve this problem or fixing this code? Please help to figure this out.
>>df
time
0
1
2
3
..
1339
I want to convert this minutes to particular format 1980-03-01 11:00 where I will use the date 2021-3-21 and convert the minutes tohhmm part. The dataframe will look like.
>df
datetime time
2021-3-21 00:00 0
2021-3-21 00:01 1
2021-3-21 00:02 2
...
How can I format my data in this way?

Let's try with pd.to_timedelta instead to get the duration in minutes from time then add a TimeStamp:
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
df.head():
time datetime
0 0 2021-03-21 00:00:00
1 1 2021-03-21 00:01:00
2 2 2021-03-21 00:02:00
3 3 2021-03-21 00:03:00
4 4 2021-03-21 00:04:00
Complete Working Example with Sample Data:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': np.arange(0, 1440)})
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
print(df)

Related

Is there a better way to increment a timestamp column in a pandas dataframe?

I'm working with a large pandas dataframe and want to add a timestamp column which correlates to the value of another column. For example, the current dataframe looks like this:
Server
Hour
server1
0
server2
0
server1000
0
server1
1
server2
1
and so on, with the hours column at ranging from 0-167, as they correlate to the hourly timestamps of the following week.
I have the following code which establishes the weekly timestamps:
today = datetime.today()
start = (today - timedelta(days=today.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
end = (start + timedelta(days=6)).replace(hour=0, minute=0, second=0, microsecond=0)
print("end: " + str(end))
From there, I try to create the new "time" column arithmetically:
end=end.timestamp()
total_df['time']=end
total_df['time'] = total_df['time'].astype(float) #to convert to a float so I can multiply it with the time column
total_df['time']=total_df['time']+3600*total_df['time'] #standardize timestamp to Sunday since the initial "end" was monday
Then I convert the time column back to a string and convert the unix timestamp to datetime
total_df['hour'] = total_df['hour'].astype(str)
total_df['hour']=pd.to_datetime(total_df['hour'],unit='s', utc='true')
Unfortunately, this method doesn't use my current timezone and standardizes to UTC, so the finalized hourly timestamps are 4 hours ahead of where they should be. I can account for this by subtracting 4 hours before conversion, but I feel like there must be a cleaner way to do this using datetime. My solution seems like such a roundabout way to say "add however many hours are in the hour column."
My expected output should look like this:
Server
Hour
Time
server1
0
2022-04-24 00:00:00-4:00
server2
0
2022-04-24 00:00:00-04:00
serverx
0
2022-04-24 00:00:00-04:00
server1000
0
2022-04-24 00:00:00-04:00
server1
1
2022-04-24 01:00:00-04:00
server2
1
2022-04-24 01:00:00-04:00
serverx
1
2022-04-24 01:00:00-04:00
server1000
1
2022-04-24 01:00:00-04:00
x
x
x
server1000
167
2022-04-30 23:00:00-04:00
with the "x" and "serverx" covering all of the server and hour values between 1 and 1000 and 1 and 167, respectively.
Alternatively, is there an easy way to convert between time zones? My current output column looks like it should, except it's in UTC, and I'd like it in EST.
Do I understand correctly that you start out with a dataframe that has a hour column, for example:
df = pd.DataFrame({'hour': range(5)})
hour
0 0
1 1
2 2
3 3
4 4
In this case you could try the following:
from datetime import date, datetime, timedelta
start = date.today()
df['time'] = (
datetime(start.year, start.month, start.day)
+ timedelta(days=6 - start.weekday())
+ df['hour'].astype('timedelta64[h]')
).dt.tz_localize('EST')
Result:
hour time
0 0 2022-04-24 00:00:00-05:00
1 1 2022-04-24 01:00:00-05:00
2 2 2022-04-24 02:00:00-05:00
3 3 2022-04-24 03:00:00-05:00
4 4 2022-04-24 04:00:00-05:00
Or use an explicit timezone offset:
from datetime import date, datetime, timedelta, timezone
start = date.today()
df['time'] = (
datetime(
start.year, start.month, start.day, tzinfo=timezone(timedelta(hours=-5))
)
+ timedelta(days=6 - start.weekday())
+ df['hour'].astype('timedelta64[h]')
)

pandas - combine time and date from two dataframe columns to a datetime column

This is a follow up question of the accepted solution in here.
I have a pandas dataframe:
In one column 'time' is the time stored in the following format: 'HHMMSS' (e.g. 203412 means 20:34:12).
In another column 'date' the date is stored in the following format: 'YYmmdd' (e.g 200712 means 2020-07-12). YY represents the addon to the year 2000.
Example:
import pandas as pd
data = {'time': ['123455', '000010', '100000'],
'date': ['200712', '210601', '190610']}
df = pd.DataFrame(data)
print(df)
# time date
#0 123455 200712
#1 000010 210601
#2 100000 190610
I need a third column which contains the combined datetime format (e.g. 2020-07-12 12:34:55) of the two other columns. So far, I can only modify the time but I do not know how to add the date.
df['datetime'] = pd.to_datetime(df['time'], format='%H%M%S')
print(df)
# time date datetime
#0 123455 200712 1900-01-01 12:34:55
#1 000010 210601 1900-01-01 00:00:10
#2 100000 190610 1900-01-01 10:00:00
How can I add in column df['datetime'] the date from column df['date'], so that the dataframe is:
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
I found this question, but I am not exactly sure how to use it for my purpose.
You can join columns first and then specify formar:
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format='%y%m%d%H%M%S')
print(df)
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
If possible integer columns:
df['datetime'] = pd.to_datetime(df['date'].astype(str) + df['time'].astype(str), format='%y%m%d%H%M%S')

converting "H:M:S" string in pandas to datetime object

import pandas as pd
import datetime
dictt={'s_time': ["06:30:00", "07:30:00","16:30:00"], 'f_time': ["10:30:00", "23:30:00","23:30:00"]}
df=pd.DataFrame(dictt)
in this case i want to convert them times in to datetime object so i can later on use it for calculation or others.
when i command df['s_time']=pd.to_datetime(df['s_time'],format='%H:%M:%S').dt.time
it gives error:
time data '24:00:00' does not match format '%H:%M:%S' (match)
so i dont know how to fix this
"24:00:00" means "00:00:00"
If it's just "24:00:00" that's causing trouble, you can replace the "24:" prefix with "00:":
import pandas as pd
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# replace prefix "24:" with "00:"
df['time'] = df['time'].str.replace('^24:', '00:', regex=True)
# now to_datetime
df['time'] = pd.to_datetime(df['time'])
df['time']
0 2021-04-17 06:30:24
1 2021-04-17 07:24:00
2 2021-04-17 00:00:00
Name: time, dtype: datetime64[ns]
1 to 24 hour clock (instead of 0 to 23)
If however your time notation goes from 1 to 24 hours (instead of 0 to 23), you can parse string to timedelta, subtract one hour and then cast to datetime:
df = pd.DataFrame({'time': ["06:30:24", "07:24:00", "24:00:00"]})
# to timedelta and subtract one hour
df['time'] = pd.to_timedelta(df['time']) - pd.Timedelta(hours=1)
# to string and then datettime:
df['time'] = pd.to_datetime(df['time'].astype(str).str.split(' ').str[-1])
df['time']
0 2021-04-17 05:30:24
1 2021-04-17 06:24:00
2 2021-04-17 23:00:00
Name: time, dtype: datetime64[ns]
Note: the underlying assumption here is that the date is irrelevant. If there also is a date, see the related question I linked in the comments section.

How to calculate a mean of measurements taken at the same time (n-hours window) on different days in pandas dataframe?

I have a dataset with measurements acquired almost every 2-hours over a week. I would like to calculate a mean of measurements taken at the same time on different days. For example, I want to calculate the mean of every measurement taken between 12:00 and 13:59.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
#generating test dataframe
date_today = datetime.now()
time_of_taken_measurment = pd.date_range(date_today, date_today +
timedelta(72), freq='2H20MIN')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100,
size=len(time_of_taken_measurment))
df = pd.DataFrame({'measurementTimestamp': time_of_taken_measurment, 'measurment': data})
df = df.set_index('measurementTimestamp')
#Calculating the mean for measurments taken in the same hour
hourly_average = df.groupby([df.index.hour]).mean()
hourly_average
The code above gives me this output:
0 47.967742
1 43.354839
2 46.935484
.....
22 42.833333
23 52.741935
I would like to have a result like this:
0 mean0
2 mean1
4 mean2
.....
20 mean10
22 mean11
I was trying to solve my problem using rolling_mean function, but I could not find a way to apply it to my static case.
Use the built-in floor functionality of datetimeIndex, which allows you to easily create 2 hour time bins.
df.groupby(df.index.floor('2H').time).mean()
Output:
measurment
00:00:00 51.516129
02:00:00 54.868852
04:00:00 52.935484
06:00:00 43.177419
08:00:00 43.903226
10:00:00 55.048387
12:00:00 50.639344
14:00:00 48.870968
16:00:00 43.967742
18:00:00 49.225806
20:00:00 43.774194
22:00:00 50.590164

Add months to a date in Pandas

I'm trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.
This is what I've tried:
#create dataframe
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
#create a future month period
plus_month_period = 3
#calculate date + future period
df['future_date'] = plus_month_period.astype("timedelta64[M]")
However, I get the following error:
AttributeError: 'int' object has no attribute 'astype'
You could use pd.DateOffset
In [1756]: df.date + pd.DateOffset(months=plus_month_period)
Out[1756]:
0 2017-01-11
1 2017-02-01
Name: date, dtype: datetime64[ns]
Details
In [1757]: df
Out[1757]:
date
0 2016-10-11
1 2016-11-01
In [1758]: plus_month_period
Out[1758]: 3
Suppose you have a dataframe of the following format, where you have to add integer months to a date column.
Start_Date
Months_to_add
2014-06-01
23
2014-06-01
4
2000-10-01
10
2016-07-01
3
2017-12-01
90
2019-01-01
2
In such a scenario, using Zero's code or mattblack's code won't be useful. You have to use lambda function over the rows where the function takes 2 arguments -
A date to which months need to be added to
A month value in integer format
You can use the following function:
# Importing required modules
from dateutil.relativedelta import relativedelta
# Defining the function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.
from tqdm import tqdm
tqdm.pandas()
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
Here's the full code form dataset creation, for your reference:
import pandas as pd
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
tqdm.pandas()
# Initilize a new dataframe
df = pd.DataFrame()
# Add Start Date column
df["Start_Date"] = ['2014-06-01T00:00:00.000000000',
'2014-06-01T00:00:00.000000000',
'2000-10-01T00:00:00.000000000',
'2016-07-01T00:00:00.000000000',
'2017-12-01T00:00:00.000000000',
'2019-01-01T00:00:00.000000000']
# To convert the date column to a datetime format
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
# Add months column
df["Months_to_add"] = [23, 4, 10, 3, 90, 2]
# Defining the Add Months function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
# Apply function on the dataframe using lambda operation.
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
You will have the final output dataframe as follows.
Start_Date
Months_to_add
End_Date
2014-06-01
23
2016-05-01
2014-06-01
4
2014-10-01
2000-10-01
10
2001-08-01
2016-07-01
3
2016-10-01
2017-12-01
90
2025-06-01
2019-01-01
2
2019-03-01
Please add to comments if there are any issues with the above code.
All the best!
I believe that the simplest and most efficient (faster) way to solve this is to transform the date to monthly periods with to_period(M), add the result with the values of the Months_to_add column and then retrieve the data as datetime with the .dt.to_timestamp() command.
Using the sample data created by #Aruparna Maity
Start_Date
Months_to_add
2014-06-01
23
2014-06-20
4
2000-10-01
10
2016-07-05
3
2017-12-15
90
2019-01-01
2
df['End_Date'] = ((df['Start_Date'].dt.to_period('M')) + df['Months_to_add']).dt.to_timestamp()
df.head(6)
#output
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-01
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-01
4 2017-12-15 90 2025-06-01
5 2019-01-01 2 2019-03-01
If the exact day is needed, just repeat the process, but changing the periods to days
df['End_Date'] = ((df['End_Date'].dt.to_period('D')) + df['Start_Date'].dt.day -1).dt.to_timestamp()
#output:
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-20
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-05
4 2017-12-15 90 2025-06-15
5 2019-01-01 2 2019-03-01
Another way using numpy timedelta64
df['date'] + np.timedelta64(plus_month_period, 'M')
0 2017-01-10 07:27:18
1 2017-01-31 07:27:18
Name: date, dtype: datetime64[ns]

Categories

Resources