strftime is not recognizing the real datetime - python

I have a dataframe like this:
df = pd.DataFrame({"DateTime":["26/06/2014 22:05:16",
"25/06/2014 22:05:56",
"01/07/2014 22:05:30",
"01/08/2014 19:04:23"],
"Data":[20, 31, 25, 44]})
df
Out[9]:
DateTime Data
0 26/06/2014 22:05:16 20
1 25/06/2014 22:05:56 31
2 01/07/2014 22:05:30 25
3 01/08/2014 19:04:23 44
I would like to convert my DateTime column to datetime64 and specify a format. The original data is like DAY/MONTH/YEAR and then I would like to put them as YEAR-MONTH-DAY. I tried this:
df["DateTime"] = pd.to_datetime(df["DateTime"])
df["DateTime"] = df["DateTime"].dt.strftime('%Y-%m-%d %H:%M:%S')
df
Out[11]:
DateTime Data
0 2014-06-26 22:05:16 20
1 2014-06-25 22:05:56 31
2 2014-01-07 22:05:30 25
3 2014-01-08 19:04:23 44
The first two dates are ok, although the last two didn't convert correctly. The month became day...it should be like this:
2 2014-07-01 22:05:30 25
3 2014-08-01 19:04:23 44
Anyone could show me the correct way to convert this datetime?

The default format for pd.to_datetime is MM/DD. Since your data is DD/MM, you should tell to_datetime to parse day first with dayfirst=True:
df['DateTime'] = pd.to_datetime(df["DateTime"], dayfirst=True).dt.strftime('%Y-%m-%d %H:%M:%S')

in converting datetime, specify dayfirst as True
df["DateTime"] = pd.to_datetime(df["DateTime"], dayfirst=True)
df["DateTime"] = df["DateTime"].dt.strftime('%Y-%m-%d %H:%M:%S')
df
DateTime Data
0 2014-06-26 22:05:16 20
1 2014-06-25 22:05:56 31
2 2014-07-01 22:05:30 25
3 2014-08-01 19:04:23 44

Related

Pandas groupby month output is incorrect [duplicate]

My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30

Adding a datetime column in pandas dataframe from minute values

I have a data frame where there is time columns having minutes from 0-1339 meaning 1440 minutes of a day. I want to add a column datetime representing the day 2021-3-21 including hh amd mm like this 1980-03-01 11:00 I tried following code
from datetime import datetime, timedelta
date = datetime.date(2021, 3, 21)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + pd.to_timedelta(days, unit='d')
But the error seems like descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
Is there any other way to solve this problem or fixing this code? Please help to figure this out.
>>df
time
0
1
2
3
..
1339
I want to convert this minutes to particular format 1980-03-01 11:00 where I will use the date 2021-3-21 and convert the minutes tohhmm part. The dataframe will look like.
>df
datetime time
2021-3-21 00:00 0
2021-3-21 00:01 1
2021-3-21 00:02 2
...
How can I format my data in this way?
Let's try with pd.to_timedelta instead to get the duration in minutes from time then add a TimeStamp:
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
df.head():
time datetime
0 0 2021-03-21 00:00:00
1 1 2021-03-21 00:01:00
2 2 2021-03-21 00:02:00
3 3 2021-03-21 00:03:00
4 4 2021-03-21 00:04:00
Complete Working Example with Sample Data:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': np.arange(0, 1440)})
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
print(df)

Pandas - How to extract HH:MM from datetime column in Python?

I just want to extract from my df HH:MM. How do I do it?
Here's a description of the column in the df:
count 810
unique 691
top 2018-07-25 11:14:00
freq 5
Name: datetime, dtype: object
The string value includes a full time stamp. The goal is to parse each row's HH:MM into another df, and to loop back over and extract just the %Y-%m-%d into another df.
Assume the df looks like
print(df)
date_col
0 2018-07-25 11:14:00
1 2018-08-26 11:15:00
2 2018-07-29 11:17:00
#convert from string to datetime
df['date_col'] = pd.to_datetime(df['date_col'])
#to get date only
print(df['date_col'].dt.date)
0 2018-07-25
1 2018-08-26
2 2018-07-29
#to get time:
print(df['date_col'].dt.time)
0 11:14:00
1 11:15:00
2 11:17:00
#to get hour and minute
print(df['date_col'].dt.strftime('%H:%M'))
0 11:14
1 11:15
2 11:17
First convert to datetime:
df['datetime'] = pd.to_datetime(df['datetime'])
Then you can do:
df2['datetime'] = df['datetime'].dt.strptime('%H:%M')
df3['datetime'] = df['datetime'].dt.strptime('%Y-%m-%d')
General solution (not pandas based)
import time
top = '2018-07-25 11:14:00'
time_struct = time.strptime(top, '%Y-%m-%d %H:%M:%S')
short_top = time.strftime('%H:%M', time_struct)
print(short_top)
Output
11:14

Converting numeric SAS dates to datetimes Pandas

I am currently trying to reproduce this: convert numeric sas date to datetime in Pandas
, but get the following error:
"Python int too large to convert to C long"
Here and example of my dates:
0 1.416096e+09
1 1.427069e+09
2 1.433635e+09
3 1.428624e+09
4 1.433117e+09
Name: dates, dtype: float64
Any ideas?
Here is a little hacky solution. If the date column is called 'date', try
df['date'] = pd.to_datetime(df['date'] - 315619200, unit = 's')
Here 315619200 is the number of seconds between Jan 1 1960 and Jan 1 1970.
You get
0 2004-11-15 00:00:00
1 2005-03-22 00:03:20
2 2005-06-05 23:56:40
3 2005-04-09 00:00:00
4 2005-05-31 00:03:20

Add months to a date in Pandas

I'm trying to figure out how to add 3 months to a date in a Pandas dataframe, while keeping it in the date format, so I can use it to lookup a range.
This is what I've tried:
#create dataframe
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
#create a future month period
plus_month_period = 3
#calculate date + future period
df['future_date'] = plus_month_period.astype("timedelta64[M]")
However, I get the following error:
AttributeError: 'int' object has no attribute 'astype'
You could use pd.DateOffset
In [1756]: df.date + pd.DateOffset(months=plus_month_period)
Out[1756]:
0 2017-01-11
1 2017-02-01
Name: date, dtype: datetime64[ns]
Details
In [1757]: df
Out[1757]:
date
0 2016-10-11
1 2016-11-01
In [1758]: plus_month_period
Out[1758]: 3
Suppose you have a dataframe of the following format, where you have to add integer months to a date column.
Start_Date
Months_to_add
2014-06-01
23
2014-06-01
4
2000-10-01
10
2016-07-01
3
2017-12-01
90
2019-01-01
2
In such a scenario, using Zero's code or mattblack's code won't be useful. You have to use lambda function over the rows where the function takes 2 arguments -
A date to which months need to be added to
A month value in integer format
You can use the following function:
# Importing required modules
from dateutil.relativedelta import relativedelta
# Defining the function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
After this you can use the following code snippet to add months to the Start_Date column. Use progress_apply functionality of Pandas. Refer to this Stackoverflow answer on progress_apply : Progress indicator during pandas operations.
from tqdm import tqdm
tqdm.pandas()
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
Here's the full code form dataset creation, for your reference:
import pandas as pd
from dateutil.relativedelta import relativedelta
from tqdm import tqdm
tqdm.pandas()
# Initilize a new dataframe
df = pd.DataFrame()
# Add Start Date column
df["Start_Date"] = ['2014-06-01T00:00:00.000000000',
'2014-06-01T00:00:00.000000000',
'2000-10-01T00:00:00.000000000',
'2016-07-01T00:00:00.000000000',
'2017-12-01T00:00:00.000000000',
'2019-01-01T00:00:00.000000000']
# To convert the date column to a datetime format
df["Start_Date"] = pd.to_datetime(df["Start_Date"])
# Add months column
df["Months_to_add"] = [23, 4, 10, 3, 90, 2]
# Defining the Add Months function
def add_months(start_date, delta_period):
end_date = start_date + relativedelta(months=delta_period)
return end_date
# Apply function on the dataframe using lambda operation.
df["End_Date"] = df.progress_apply(lambda row: add_months(row["Start_Date"], row["Months_to_add"]), axis = 1)
You will have the final output dataframe as follows.
Start_Date
Months_to_add
End_Date
2014-06-01
23
2016-05-01
2014-06-01
4
2014-10-01
2000-10-01
10
2001-08-01
2016-07-01
3
2016-10-01
2017-12-01
90
2025-06-01
2019-01-01
2
2019-03-01
Please add to comments if there are any issues with the above code.
All the best!
I believe that the simplest and most efficient (faster) way to solve this is to transform the date to monthly periods with to_period(M), add the result with the values of the Months_to_add column and then retrieve the data as datetime with the .dt.to_timestamp() command.
Using the sample data created by #Aruparna Maity
Start_Date
Months_to_add
2014-06-01
23
2014-06-20
4
2000-10-01
10
2016-07-05
3
2017-12-15
90
2019-01-01
2
df['End_Date'] = ((df['Start_Date'].dt.to_period('M')) + df['Months_to_add']).dt.to_timestamp()
df.head(6)
#output
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-01
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-01
4 2017-12-15 90 2025-06-01
5 2019-01-01 2 2019-03-01
If the exact day is needed, just repeat the process, but changing the periods to days
df['End_Date'] = ((df['End_Date'].dt.to_period('D')) + df['Start_Date'].dt.day -1).dt.to_timestamp()
#output:
Start_Date Months_to_add End_Date
0 2014-06-01 23 2016-05-01
1 2014-06-20 4 2014-10-20
2 2000-10-01 10 2001-08-01
3 2016-07-05 3 2016-10-05
4 2017-12-15 90 2025-06-15
5 2019-01-01 2 2019-03-01
Another way using numpy timedelta64
df['date'] + np.timedelta64(plus_month_period, 'M')
0 2017-01-10 07:27:18
1 2017-01-31 07:27:18
Name: date, dtype: datetime64[ns]

Categories

Resources