Pandas - How to extract HH:MM from datetime column in Python? - python

I just want to extract from my df HH:MM. How do I do it?
Here's a description of the column in the df:
count 810
unique 691
top 2018-07-25 11:14:00
freq 5
Name: datetime, dtype: object
The string value includes a full time stamp. The goal is to parse each row's HH:MM into another df, and to loop back over and extract just the %Y-%m-%d into another df.

Assume the df looks like
print(df)
date_col
0 2018-07-25 11:14:00
1 2018-08-26 11:15:00
2 2018-07-29 11:17:00
#convert from string to datetime
df['date_col'] = pd.to_datetime(df['date_col'])
#to get date only
print(df['date_col'].dt.date)
0 2018-07-25
1 2018-08-26
2 2018-07-29
#to get time:
print(df['date_col'].dt.time)
0 11:14:00
1 11:15:00
2 11:17:00
#to get hour and minute
print(df['date_col'].dt.strftime('%H:%M'))
0 11:14
1 11:15
2 11:17

First convert to datetime:
df['datetime'] = pd.to_datetime(df['datetime'])
Then you can do:
df2['datetime'] = df['datetime'].dt.strptime('%H:%M')
df3['datetime'] = df['datetime'].dt.strptime('%Y-%m-%d')

General solution (not pandas based)
import time
top = '2018-07-25 11:14:00'
time_struct = time.strptime(top, '%Y-%m-%d %H:%M:%S')
short_top = time.strftime('%H:%M', time_struct)
print(short_top)
Output
11:14

Related

pandas - combine time and date from two dataframe columns to a datetime column

This is a follow up question of the accepted solution in here.
I have a pandas dataframe:
In one column 'time' is the time stored in the following format: 'HHMMSS' (e.g. 203412 means 20:34:12).
In another column 'date' the date is stored in the following format: 'YYmmdd' (e.g 200712 means 2020-07-12). YY represents the addon to the year 2000.
Example:
import pandas as pd
data = {'time': ['123455', '000010', '100000'],
'date': ['200712', '210601', '190610']}
df = pd.DataFrame(data)
print(df)
# time date
#0 123455 200712
#1 000010 210601
#2 100000 190610
I need a third column which contains the combined datetime format (e.g. 2020-07-12 12:34:55) of the two other columns. So far, I can only modify the time but I do not know how to add the date.
df['datetime'] = pd.to_datetime(df['time'], format='%H%M%S')
print(df)
# time date datetime
#0 123455 200712 1900-01-01 12:34:55
#1 000010 210601 1900-01-01 00:00:10
#2 100000 190610 1900-01-01 10:00:00
How can I add in column df['datetime'] the date from column df['date'], so that the dataframe is:
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
I found this question, but I am not exactly sure how to use it for my purpose.
You can join columns first and then specify formar:
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format='%y%m%d%H%M%S')
print(df)
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
If possible integer columns:
df['datetime'] = pd.to_datetime(df['date'].astype(str) + df['time'].astype(str), format='%y%m%d%H%M%S')

unsupported operand type(s) for +: 'Timestamp' and 'Timestamp' [duplicate]

I just want to extract from my df HH:MM. How do I do it?
Here's a description of the column in the df:
count 810
unique 691
top 2018-07-25 11:14:00
freq 5
Name: datetime, dtype: object
The string value includes a full time stamp. The goal is to parse each row's HH:MM into another df, and to loop back over and extract just the %Y-%m-%d into another df.
Assume the df looks like
print(df)
date_col
0 2018-07-25 11:14:00
1 2018-08-26 11:15:00
2 2018-07-29 11:17:00
#convert from string to datetime
df['date_col'] = pd.to_datetime(df['date_col'])
#to get date only
print(df['date_col'].dt.date)
0 2018-07-25
1 2018-08-26
2 2018-07-29
#to get time:
print(df['date_col'].dt.time)
0 11:14:00
1 11:15:00
2 11:17:00
#to get hour and minute
print(df['date_col'].dt.strftime('%H:%M'))
0 11:14
1 11:15
2 11:17
First convert to datetime:
df['datetime'] = pd.to_datetime(df['datetime'])
Then you can do:
df2['datetime'] = df['datetime'].dt.strptime('%H:%M')
df3['datetime'] = df['datetime'].dt.strptime('%Y-%m-%d')
General solution (not pandas based)
import time
top = '2018-07-25 11:14:00'
time_struct = time.strptime(top, '%Y-%m-%d %H:%M:%S')
short_top = time.strftime('%H:%M', time_struct)
print(short_top)
Output
11:14

Incorrect order of M/D in datetimes

I have a date column in my csv file
This is my Date column data
14/3/18
28/3/18
9/4/2018
How to make the year all become 2018 ?
I have tried this
df['DateTime'] = pd.to_datetime(df['Date'])
print (df['DateTime'])
but it return
1 2018-03-14
2 2018-03-28
3 2018-09-04
The Last column 09 become month but it supposed 04 is month.
Add parameter dayfirst=True:
df['DateTime'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
Date DateTime
0 14/3/18 2018-03-14
1 28/3/18 2018-03-28
2 9/4/2018 2018-04-09
You can use .dt.strftime:
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.strftime("%Y-%d-%m")
Output:
0 2018-14-03
1 2018-28-03
2 2018-04-09
Name: A, dtype: object

Date Offset Pandas Field Based Off Another Field

I have a data frame with a field time of timestamps with dates, and another column period. How can I add a number of days to time based on period?
Current Output:
time period
------------------------------
2020-04-28 10:00:00 1
2020-04-27 12:34:56 3
Expected Output
time
---------------
2020-04-29 10:00:00
2020-04-30 12:34:56
If I try df['time'] = df['time'] + pd.DateOffset(df['period']) I get an error TypeError:nargument must be an integer, got <class 'pandas.core.series.Series'> because it is trying to pass the whole column into the function which expects an integer. How can this be accomplished?
Because days can be converted to timedeltas by to_timedelta is possible use:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='d')
print (df)
time period
0 2020-04-29 10:00:00 1
1 2020-04-30 12:34:56 3
But if want add months is necessary use:
df['time'] = df['time'] + df['period'].apply(lambda x: pd.DateOffset(months=x))
print (df)
time period
0 2020-05-28 10:00:00 1
1 2020-07-27 12:34:56 3
If use month timedelatas is working with 'default month', so precision is different:
df['time'] = df['time'] + pd.to_timedelta(df['period'], unit='M')
print (df)
time period
0 2020-05-28 20:29:06 1
1 2020-07-27 20:02:14 3

Add months to a date in Pandas

I'm trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.
This is what I've tried:
#create dataframe
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
#create a future month period
plus_month_period = 3
#calculate date + future period
df['future_date'] = plus_month_period.astype("timedelta64[M]")
However, I get the following error:
AttributeError: 'int' object has no attribute 'astype'
You could use pd.DateOffset
In [1756]: df.date + pd.DateOffset(months=plus_month_period)
Out[1756]:
0 2017-01-11
1 2017-02-01
Name: date, dtype: datetime64[ns]
Details
In [1757]: df
Out[1757]:
date
0 2016-10-11
1 2016-11-01
In [1758]: plus_month_period
Out[1758]: 3
Suppose you have a dataframe of the following format, where you have to add integer months to a date column.
Start_Date
Months_to_add
2014-06-01
23
2014-06-01
4
2000-10-01
10
2016-07-01
3
2017-12-01
90
2019-01-01
2
In such a scenario, using Zero's code or mattblack's code won't be useful. You have to use lambda function over the rows where the function takes 2 arguments -
A date to which months need to be added to
A month value in integer format
You can use the following function:
# Importing required modules
from dateutil.relativedelta import relativedelta
# Defining the function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.
from tqdm import tqdm
tqdm.pandas()
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
Here's the full code form dataset creation, for your reference:
import pandas as pd
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
tqdm.pandas()
# Initilize a new dataframe
df = pd.DataFrame()
# Add Start Date column
df["Start_Date"] = ['2014-06-01T00:00:00.000000000',
'2014-06-01T00:00:00.000000000',
'2000-10-01T00:00:00.000000000',
'2016-07-01T00:00:00.000000000',
'2017-12-01T00:00:00.000000000',
'2019-01-01T00:00:00.000000000']
# To convert the date column to a datetime format
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
# Add months column
df["Months_to_add"] = [23, 4, 10, 3, 90, 2]
# Defining the Add Months function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
# Apply function on the dataframe using lambda operation.
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
You will have the final output dataframe as follows.
Start_Date
Months_to_add
End_Date
2014-06-01
23
2016-05-01
2014-06-01
4
2014-10-01
2000-10-01
10
2001-08-01
2016-07-01
3
2016-10-01
2017-12-01
90
2025-06-01
2019-01-01
2
2019-03-01
Please add to comments if there are any issues with the above code.
All the best!
I believe that the simplest and most efficient (faster) way to solve this is to transform the date to monthly periods with to_period(M), add the result with the values of the Months_to_add column and then retrieve the data as datetime with the .dt.to_timestamp() command.
Using the sample data created by #Aruparna Maity
Start_Date
Months_to_add
2014-06-01
23
2014-06-20
4
2000-10-01
10
2016-07-05
3
2017-12-15
90
2019-01-01
2
df['End_Date'] = ((df['Start_Date'].dt.to_period('M')) + df['Months_to_add']).dt.to_timestamp()
df.head(6)
#output
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-01
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-01
4 2017-12-15 90 2025-06-01
5 2019-01-01 2 2019-03-01
If the exact day is needed, just repeat the process, but changing the periods to days
df['End_Date'] = ((df['End_Date'].dt.to_period('D')) + df['Start_Date'].dt.day -1).dt.to_timestamp()
#output:
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-20
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-05
4 2017-12-15 90 2025-06-15
5 2019-01-01 2 2019-03-01
Another way using numpy timedelta64
df['date'] + np.timedelta64(plus_month_period, 'M')
0 2017-01-10 07:27:18
1 2017-01-31 07:27:18
Name: date, dtype: datetime64[ns]

Categories

Resources