Datetime formatting Python - python

I want to convert string into date time format. I have three different formats in my current column like this-
01-05-21 5:50 (month-day-year hour:min)
13/01/2021 05:50:00 (Day/month/year hour:min:sec)
1/1/21 0:00 (day/month/year hour:min)
I want to convert them into single format lets say 13-01-21 05:50:00 and 05-01-21 05:50:00 (day-month-year Hour:min:sec)
I can not able to do both the things in single python code.
df.head()
ts
0 01-05-21 5:50
1 01-05-21 6:00
2 13/01/2021 05:00:00
3 13/01/2021 05:10:00
4 1/1/21 0:00
(Three different formats)

https://stackabuse.com/how-to-format-dates-in-python/
Use this link. You can add a hyphen between the number yourself

You may use to_datetime here to conditionally choose between one of the two formatting masks, when converting the string column to datetime:
df[df["df_str"].str.contains(r'^\d{2}-\d{2}-\d{2} \d{1,2}:\d{2}$')]["dt"] = pd.to_datetime(data["df_str"], format='%m-%d-%y %H:%M')
df[not df["df_str"].str.contains(r'^\d{2}-\d{2}-\d{2} \d{1,2}:\d{2}$')]["dt"] = pd.to_datetime(data["df_str"], format='%d/%m/%Y %H:%M:%S')

Related

Convert a column to a specific time format which contains different types of time formats in python

This is my data frame
df = pd.DataFrame({
'Time': ['10:00PM', '15:45:00', '13:40:00AM','5:00']
})
Time
0 10:00PM
1 15:45:00
2 13:40:00AM
3 5:00
I need to convert the time format in a specific format which is my expected output, given below.
Time
0 22:00:00
1 15:45:00
2 01:40:00
3 05:00:00
I tried using split and endswith function of str which is a complicated solution. Is there any better way to achieve this?
Thanks in advance!
here you go. One thing to mention though 13:40:00AM will result in an error since 13 is a) wrong format as AM/PM only go from 1 to 12 and b) PM (which 13 would be) cannot at the same time be AM :)
Cheers
import pandas as pd
df = pd.DataFrame({'Time': ['10:00PM', '15:45:00', '01:40:00AM', '5:00']})
df['Time'] = pd.to_datetime(df['Time'])
print(df['Time'].dt.time)
<<< 22:00:00
<<< 15:45:00
<<< 01:45:00
<<< 05:00:00

Change date column to datetime

I am working on a stock market analysis where I look at past Balance Sheets and income statements, and want to change the date column which saves them as a string of the form "2021-09-30" into datetimes. I am trying to use pd.to_datetime but it is giving me an error.
When I run
df['datekey'] = pd.to_datetime(df['datekey'], format='%Y-%m-%d')
I get
"ValueError: time data "2021-09-30" doesn't match format specified"
when it should (if I am doing this correctly).
This column doesn't have a time value in it. It is just (for all dates) "2021-09-30".
You have extra quotes and spaces in your data. Try:
df["datekey"] = pd.to_datetime(df["datekey"].str.replace(" ","").str.strip('"'), format="%Y-%m-%d")
>>> df["datekey"]
0 2021-09-30
1 2021-06-30
2 2021-03-31
3 2020-12-31
4 2020-09-30
5 2020-06-30
6 2020-03-31
7 2019-12-31
8 2019-09-30
9 2019-06-30
Name: datekey, dtype: datetime64[ns]
Seems like the value itself is enclosed by double quotes, you need to include quotes as well in your formats:
df['datekey'] = pd.to_datetime(df['datekey'], format='"%Y-%m-%d"')
Alternatively, you can strip off the quotes before converting to datetime, this is useful if some values are not enclosed by double quotes:
df['datekey'] = pd.to_datetime(df['datekey'].str.strip('"'), format='%Y-%m-%d')

How to check if a column has a particular Date format or not using DATETIME in python?

I am new to python. I have a data-frame which has a date column in it, it has different formats. I would like to check if it is following particular date format or not. I it is not following I want to drop it. I have tried using try except and iterating over the rows. But I am looking for a faster way to check if the column is following a particular date format or not. If it is not following then it has to drop. Is there any faster way to do it? Using DATE TIME library?
My code:
Date_format = %Y%m%d
df =
Date abc
0 2020-03-22 q
1 03-12-2020 w
2 55552020 e
3 25122020 r
4 12/25/2020 r
5 1212202033 y
Excepted out:
Date abc
0 2020-03-22 q
You could try
pd.to_datetime(df.Date, errors='coerce')
0 2020-03-22
1 2020-03-12
2 NaT
3 NaT
4 2020-12-25
5 NaT
It's easy to drop the null values then
EDIT:
For a given format you can still leverage pd.to_datetime:
datetimes = pd.to_datetime(df.Date, format='%Y-%m-%d', errors='coerce')
datetimes
0 2020-03-22
1 NaT
2 NaT
3 NaT
4 NaT
5 NaT
df.loc[datetimes.notnull()]
Also note I am using the format %Y-%m-%d which I think is the one you want based on your expected output (not the one you gave as Date_format)

Convert YYYYMMDD into YYYY-MM-DD and HHMMSS into HH:MM:SS for candlestick plotting

I've been trying to find an answer for 4 hours, but no luck. Any help will be very appreciable.
Goal: convert 20170103 into 2017-01-03 and 022100 into 02:21:00 for candlestick plotting
date_int = 20170103
df = pd.DataFrame({'date':[date_int]*10})
df['date'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
print(df['date'])
time_int = 020100
df = pd.DataFrame({'time':[time_int]*10})
df['time'] = df['time'].apply(lambda x: pd.to_datetime(str(x), format='%H:%M:%S'))
print(df['time'])
but the second code shows 'invalid token' error.
And I also notice that this code performs very slow. If there is a more efficient way, please, let me know. Thank you so much in advance for your help.
To expand on my comments, you have a few things wrong here. Firstly as mentioned, the used format in your second example is wrong. Your data has the format '%H%M%S', so it is the one you need to specify in the argument.
When using pd.to_datetime, the specified format indicates the actual data format so that it can be correctly parsed.
In order to further modify it, you need to add Series.dt.strftime:
date_int = 20170103
df = pd.DataFrame({'date':[date_int]*10})
df.date = pd.to_datetime(df.date, format='%Y%m%d').dt.strftime('%Y-%m-%d')
date
0 2017-01-03
1 2017-01-03
2 2017-01-03
3 2017-01-03
4 2017-01-03
5 2017-01-03
6 2017-01-03
7 2017-01-03
8 2017-01-03
9 2017-01-03
So similarly for your second example you need:
df.time = pd.to_datetime(df.time, format='%H%M%S').dt.strftime('%H:%M:%S')
Here, Based on my comment above. (for Invalid token error, make it string surrounded by single quote or double)
time_int = '020100'
df = pd.DataFrame({'time':[time_int]*10})
df['time'] = df['time'].apply(lambda x: pd.to_datetime(str(x), format='%H%M%S'))
df['time'] = df['time'].dt.time
print(df['time'])
Output:
0 02:01:00
1 02:01:00
2 02:01:00
3 02:01:00
4 02:01:00
5 02:01:00
6 02:01:00
7 02:01:00
8 02:01:00
9 02:01:00
I'm looking at the question and it looks like the original question was two test cases to get code using the panda package debugged. The comment that the code ran slowly suggests that a file of dates and times is being read. Given that candlestick plots could be used with a datetime object, perhaps this all could be solved simply.
Reading each line pull the date and time out as a single string, say '20170103 022100'.
Use datetime to parse directly to a datetime object.
import datetime as dt
ts='20170103 022100'
result=dt.datetime.strptime(ts,'%Y%m%d %H%M%S')
What's nice about strptime is that the single space in the format represents whitespace, so the multiple spaces in the string parse correctly.
Hope that simplifies things.

Setting a dataframe columns to type datetime when there are blanks in the columns

I have a dataframe (df) with two columns where the head looks like
name start end
0 John 2018-11-09 00:00:00 2012-03-01 00:00:00
1 Steve 1990-09-03 00:00:00
2 Debs 1977-09-07 00:00:00 2012-07-02 00:00:00
3 Mandy 2009-01-09 00:00:00
4 Colin 1993-08-22 00:00:00 2002-06-03 00:00:00
The start and end columns have the type object. I want to change the type to datetime so I can use the following:
referenceError = DeptTemplate['start'] > DeptTemplate['end']
am trying to change the type using:
df['start'].dt.strftime('%d/%m/%Y')
df['end'].dt.strftime('%d/%m/%Y')
but I think where there are some rows where there are no date in the columns its causing a problem. How can I set any blank values so I can change the type to date time and run my analysis?
As shown in the .to_datetime docs you can set the behavior using the errors kwarg. You can also set the strftime format with the format kwarg.
# Bad values will be NaT
df["start"] = pd.to_datetime(df.start, errors='coerce', format='%d/%m/%Y')
As mentioned in the comments, you can prepare the column with replace if you absolutely must use strftime.

Categories

Resources