Date and time conversion in python pandas - python

A .csv file has a date column. When read into a pandas DataFrame and displayed, the date and time are displayed as:
2021-06-30 19:39:25
The correct date is 30-06-2021 19:39:25
How can this be changed?

using pandas.to_datetime method to convert date format will be more reliable
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')

Try strftime:
>>> date.strftime('%d-%m-%Y %H:%M:%S')
'30-06-2021 19:39:25'
>>>

try below:
df = pd.DataFrame({'Date':['2021-06-30 19:39:25', '2021-07-22 19:39:25', '2021-08-18 19:39:25']})
# convert `Date` column to datetime
df['Date'] = pd.to_datetime(df['Date'])
Solution:
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
if the above doesn't work then use belwo..
# Now convert to desired format
df['Date'] = pd.to_datetime(df["Date"].dt.strftime('%m-%d-%Y %H:%M:%S')).dt.strftime('%d-%m-%Y %H:%M:%S')
print(df)
0 30-06-2021 19:39:25
1 22-07-2021 19:39:25
2 18-08-2021 19:39:25
Name: Date, dtype: object

Related

How to convert all column values to dates?

I'm trying to convert all data in a column from the below to dates.
Event Date
2020-07-16 00:00:00
31/03/2022, 26/11/2018, 31/01/2028
This is just a small section of the data - there are more columns/rows.
I've tried to split out the cells with multiple values using the below:
df["Event Date"] = df["Event Date"].str.replace(' ', '')
df["Event Date"] = df["Event Date"].str.split(",")
df= df.explode("Event Date")
The issue with this is it sets any cell without a ',' e.g. '2020-07-16 00:00:00' to NaN.
Is there any way to separate the values with a ',' and set the entire column to date types?
You can use combination of split and explode to separate dates and then use infer_datetime_format to convert mixed date types
df = df.assign(dates=df['dates'].str.split(',')).explode('dates')
df
Out[18]:
dates
0 2020-07-16 00:00:00
1 31/03/2022
1 26/11/2018
1 31/01/2028
df.dates = pd.to_datetime(df.dates, infer_datetime_format=True)
df.dates
Out[20]:
0 2020-07-16
1 2022-03-31
1 2018-11-26
1 2028-01-31
Name: dates, dtype: datetime64[ns]
Here is a proposition with pandas.Series.str.split and pandas.Series.explode :
s_dates = (
df["Event Date"]
.str.split(",")
.explode(ignore_index=True)
.apply(pd.to_datetime, dayfirst=True)
)
Output :
0 2020-07-16
1 2022-03-31
2 2018-11-26
3 2028-01-31
Name: Event Date, dtype: datetime64[ns]
Your example table shows mixed date formats in each row. The idea is to try a date parsing technique and then try another if it fails. Using loops and having such wide variations of data types are red flags with a script design. I recommend using datetime and dateutil to handle the dates.
from datetime import datetime
from dateutil import parser
date_strings = ["2020-07-16 00:00:00", "31/03/2022, 26/11/2018, 31/01/2028"] % Get these from your table.
parsed_dates = []
for date_string in date_strings:
try:
# strptime
date_object = datetime.strptime(date_string, "%Y-%m-%d %H:%M:%S")
parsed_dates.append(date_object)
except ValueError:
# parser.parse() and split
date_strings = date_string.split(",")
for date_str in date_strings:
date_str = date_str.strip()
date_object = parser.parse(date_str, dayfirst=True)
parsed_dates.append(date_object)
print(parsed_dates)
Try the code on Trinket: https://trinket.io/python3/95c0d14271

Cannot remove timestamp in datetime

I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.

Problem converting column with date info as object to datetime

I've a column with birth dates as object, the problem is when I tried to convert it into datetime, because it displays always the next warning
time data '27126' does not match format '%d/%m/%Y' (match)
date
0 05/06/1980
1 31/07/1947
2 07/01/1963
3 26/03/1973
4 30/01/1991
5 12/12/1991
6 13/08/1987
7 10/01/1944
8 23/06/1965
9 08/10/1995
till now I've tried the next codes:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['date'] = df['date'].apply(lambda x: datetime.datetime.strptime(x, "%d/%m/%Y").strftime("%Y-%m-%d"))
df['date'] = pd.to_datetime(df['date'].str.strip(), format='%d/%m/%Y')
Add parameter errors='coerce' for convert non matched datetimes to missing values, here NaT:
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')

TypeError: Passing PeriodDtype data is invalid. Use `data.to_timestamp()` instead

How can I convert a date column with format of 2014-09 to format of 2014-09-01 00:00:00.000? The previous format is converted from df['date'] = pd.to_datetime(df['date']).dt.to_period('M').
I use df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.000'), but it generates an error: TypeError: Passing PeriodDtype data is invalid. Use data.to_timestamp() instead. I also try with pd.to_datetime(df['date']).dt.strftime('%Y-%m'), it generates same error.
First idea is convert periods to timestamps by Series.to_timestamp and then use Series.dt.strftime:
print (df)
date
0 2014-09
print (df.dtypes)
date period[M]
dtype: object
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or simply add last values same for each value:
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S').add('.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or:
df['date'] = df['date'].dt.strftime('%Y-%m').add('-01 00:00:00.000')
print (df)
date
0 2014-09-01 00:00:00.000
use %f for milliseconds
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S.%f')
sample code is
df = pd.DataFrame({
'Date': ['2014-09-01 00:00:00.000']
})
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S.%f')
df
which gives you the following output
Date
0 2014-09-01
to convert 2014-09 in Period to 2014-09-01 00:00:00.000, we can do as follows
df = pd.DataFrame({
'date': ['2014-09-05']
})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['date'] = pd.to_datetime(df['date']).dt.to_period("M")
df['date'] = df['date'].dt.strftime('%Y-%m-01 00:00:00.000')
df
Try stripping the last 3 digits
print(pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')[0][:-3])
Output:
2014-09-01 00:00:00.000
In the event the other answers don't work, you could try
df.index = pd.DatetimeIndex(df.date).to_period('s')
df.index
Which should show the datetimeindex object with the frequency set as 's'

Datetime Pandas Function Only Converting Single Series

I am working with several pandas dataframes, each of which have timestamps in a format like: "2018-01-01 00:00:00 UTC". I wrote a function to be able to scan every single one of the columns of the dataframe and change the columns that have data in this format. Here's the function:
def utc_converter(dataframe, timezone):
columns = dataframe.columns.tolist()
for column in columns:
try:
s = pd.to_datetime(dataframe[column], format='%Y-%m-%d %H:%M:%S UTC', utc=True)
except ValueError:
continue
s.dt.tz_convert(timezone)
s = s.dt.strftime('%m/%d/%Y %H:%M:%S')
dataframe[column] = s
dataframe = dataframe.replace(to_replace=pd.NaT, value=np.nan)
return dataframe
For some reason, whenever I run the function on a dataframe, it's only catching the first column, and it's not looping through any of the rest. Anyone have any idea what I've done wrong? I've been scratching my head for a bit now.
Thanks!
You can use pd.to_datetime(), with strftime() to re-format your dates:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S UTC', utc=True).dt.strftime('%m/%d/%Y %H:%M:%S')
Note that this will return a column of type str, so to convert back to datetime simply do:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %H:%M:%S')
You can just consider the first row to determine which columns are in scope. Then use pd.to_datetime on selected columns via pd.DataFrame.apply. Here's a demo:
df = pd.DataFrame([['2018-01-01 00:00:00 UTC', 0, 341.3214, 'test1',
'2019-01-01 00:00:00 UTC'],
['2015-01-01 00:00:00 UTC', 46, 235.54, 'test2',
'2020-01-01 00:00:00 UTC']],
columns=['date1', 'int', 'float', 'string', 'date2'])
dt_format = '%Y-%m-%d %H:%M:%S UTC'
L = [pd.to_datetime(i, errors='coerce', format=dt_format) for i in df.iloc[0].values]
dt_cols = df.columns[pd.Series(L).notnull()]
df[dt_cols] = df[dt_cols].apply(pd.to_datetime, format=dt_format)
Result:
print(df)
date1 int float string date2
0 2018-01-01 0 341.3214 test1 2019-01-01
1 2015-01-01 46 235.5400 test2 2020-01-01
print(df.dtypes)
date1 datetime64[ns]
int int64
float float64
string object
date2 datetime64[ns]
dtype: object

Categories

Resources