I am working with several pandas dataframes, each of which have timestamps in a format like: "2018-01-01 00:00:00 UTC". I wrote a function to be able to scan every single one of the columns of the dataframe and change the columns that have data in this format. Here's the function:
def utc_converter(dataframe, timezone):
columns = dataframe.columns.tolist()
for column in columns:
try:
s = pd.to_datetime(dataframe[column], format='%Y-%m-%d %H:%M:%S UTC', utc=True)
except ValueError:
continue
s.dt.tz_convert(timezone)
s = s.dt.strftime('%m/%d/%Y %H:%M:%S')
dataframe[column] = s
dataframe = dataframe.replace(to_replace=pd.NaT, value=np.nan)
return dataframe
For some reason, whenever I run the function on a dataframe, it's only catching the first column, and it's not looping through any of the rest. Anyone have any idea what I've done wrong? I've been scratching my head for a bit now.
Thanks!
You can use pd.to_datetime(), with strftime() to re-format your dates:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S UTC', utc=True).dt.strftime('%m/%d/%Y %H:%M:%S')
Note that this will return a column of type str, so to convert back to datetime simply do:
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %H:%M:%S')
You can just consider the first row to determine which columns are in scope. Then use pd.to_datetime on selected columns via pd.DataFrame.apply. Here's a demo:
df = pd.DataFrame([['2018-01-01 00:00:00 UTC', 0, 341.3214, 'test1',
'2019-01-01 00:00:00 UTC'],
['2015-01-01 00:00:00 UTC', 46, 235.54, 'test2',
'2020-01-01 00:00:00 UTC']],
columns=['date1', 'int', 'float', 'string', 'date2'])
dt_format = '%Y-%m-%d %H:%M:%S UTC'
L = [pd.to_datetime(i, errors='coerce', format=dt_format) for i in df.iloc[0].values]
dt_cols = df.columns[pd.Series(L).notnull()]
df[dt_cols] = df[dt_cols].apply(pd.to_datetime, format=dt_format)
Result:
print(df)
date1 int float string date2
0 2018-01-01 0 341.3214 test1 2019-01-01
1 2015-01-01 46 235.5400 test2 2020-01-01
print(df.dtypes)
date1 datetime64[ns]
int int64
float float64
string object
date2 datetime64[ns]
dtype: object
Related
A .csv file has a date column. When read into a pandas DataFrame and displayed, the date and time are displayed as:
2021-06-30 19:39:25
The correct date is 30-06-2021 19:39:25
How can this be changed?
using pandas.to_datetime method to convert date format will be more reliable
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
Try strftime:
>>> date.strftime('%d-%m-%Y %H:%M:%S')
'30-06-2021 19:39:25'
>>>
try below:
df = pd.DataFrame({'Date':['2021-06-30 19:39:25', '2021-07-22 19:39:25', '2021-08-18 19:39:25']})
# convert `Date` column to datetime
df['Date'] = pd.to_datetime(df['Date'])
Solution:
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
if the above doesn't work then use belwo..
# Now convert to desired format
df['Date'] = pd.to_datetime(df["Date"].dt.strftime('%m-%d-%Y %H:%M:%S')).dt.strftime('%d-%m-%Y %H:%M:%S')
print(df)
0 30-06-2021 19:39:25
1 22-07-2021 19:39:25
2 18-08-2021 19:39:25
Name: Date, dtype: object
I am trying to convert any date format "%Y-%m-%d". On my code, I am getting a TypeError datetime.date doesn't apply to an 'str' object.
def open_csv2():
browse_text2.set("Proccessing CSV...")
csv2file = filedialog.askopenfilename(parent=root, title="Select the CSV", filetypes=[("Text file", "*.csv")])
if csv2file:
df = pd.read_csv(csv2file, usecols=['date'])
dates = df['date']
new_dates = []
for i in dates:
n_date = datetime.strftime(i,"%Y-%m-%d")
new_dates.append({'date':n_date})
new_dates.to_csv('__newDates.csv', index=False)
root.quit()
I am getting this error:
TypeError: descriptor 'strftime' for 'datetime.date' objects doesn't apply to a 'str' object.
Thank you!
Your i in datetime.strftime(i,"%Y-%m-%d") should be in a datetime format, not in a str format.
Your df['date'] column is therefore probably in a str format, when you imported the dataframe from csv. Please check the type of data in the date column.
If you want to convert your df['date'] to string, then you should use datetime.strptime(i, "%Y-%m-%d"), instead of the strftime function.
Try Below example:
Sample Data
Print(df)
date
0 28-01-2020
1 15-02-2020
2 15-03-2020
3 25-03-2020
4 01-04-2020
5 30-05-2020
See data-type:
As you this is an obj datatype and not in datetime format.
print(df.dtypes)
date object
dtype: object
Datetime conversion:
Just convert it to datetime and you will find the datatype chnaged.
df['date'] = pd.to_datetime(df['date'])
print(df)
date
0 2020-01-28
1 2020-02-15
2 2020-03-15
3 2020-03-25
4 2020-01-04
5 2020-05-30
print(df.dtypes)
date datetime64[ns]
dtype: object
Also look at the help section, ie help(pd.to_datetime):
Examples
--------
Assembling a datetime from multiple columns of a DataFrame. The keys can be
common abbreviations like ['year', 'month', 'day', 'minute', 'second',
'ms', 'us', 'ns']) or plurals of the same
>>> df = pd.DataFrame({'year': [2015, 2016],
... 'month': [2, 3],
... 'day': [4, 5]})
>>> pd.to_datetime(df)
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
Can anyone solve this problem! I am trying to convert a Date object column to Datetime string format with the help of python. From 'YY-mm-dd' to 'YY/mm/dd 00:00' format. Dataset is given below. I have tried every options like energy_df['Date']= pd.to_datetime(energy_df['Date']),
energy_df['Date'] = pd.to_datetime(energy_df['Date'])
energy_df['month'] = energy_df['Date'].dt.month.astype(int)
energy_df['day_of_month'] = energy_df['Date'].dt.day.astype(int)
energy_df['day_of_week'] = energy_df['Date'].dt.dayofweek.astype(int)
energy_df['hour_of_day'] = energy_df['Hours']
selected_columns = ['Date', 'day_of_week', 'hour_of_day', 'Avg Specific Humidity[g/Kg]']
energy_df = energy_df[selected_columns]
Dataset image:
Convert the 'date' column to dtype datetime, the 'hour' column to dtype timedelta, add them together, and format to string.
Ex:
import pandas as pd
# some dummy input...
df = pd.DataFrame({'date': ['2015-01-01', '2015-01-01', '2015-01-01'],
'hour': [1, 2, 3]})
# to datetime / timedelta...
df['datetime'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')
# and format to string...
df['timestamp'] = df['datetime'].dt.strftime('%Y/%m/%d %H:%M')
# will give you:
df
date hour datetime timestamp
0 2015-01-01 1 2015-01-01 01:00:00 2015/01/01 01:00
1 2015-01-01 2 2015-01-01 02:00:00 2015/01/01 02:00
2 2015-01-01 3 2015-01-01 03:00:00 2015/01/01 03:00
I have this date column which the dtype: object and the format is 31-Mar-20. So i tried to turn it with datetime.strptime into datetime64[D] and with format of 2020-03-31 which somehow whatever i have tried it does not work, i have tried some methode from this and this. In some way, it does turn my column to datetime64 but it has timestamp in it and i don't want it. I need it to be datetime without timestamp and the format is 2020-03-31 This is my code
dates = [datetime.datetime.strptime(ts,'%d-%b-%y').strftime('%Y-%m-%d')
for ts in df['date']]
df['date']= pd.DataFrame({'date': dates})
df = df.sort_values(by=['date'])
This approach might work -
import pandas as pd
df = pd.DataFrame({'dates': ['20-Mar-2020', '21-Mar-2020', '22-Mar-2020']})
df
dates
0 20-Mar-2020
1 21-Mar-2020
2 22-Mar-2020
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%Y').dt.date
df
dates
0 2020-03-20
1 2020-03-21
2 2020-03-22
df['date'] = pd.to_datetime(df['date'], format="%d-%b-%y")
This converts it to a datetime, when you look at df it displays values as 2020-03-31 like you want, however these are all datetime objects so if you extract one value with df['date'][0] then you see Timestamp('2020-03-31 00:00:00')
if you want to convert them into a date you can do
df['date'] = [df_datetime.date() for df_datetime in df['date'] ]
There is probably a better way of doing this step.
How can I convert a date column with format of 2014-09 to format of 2014-09-01 00:00:00.000? The previous format is converted from df['date'] = pd.to_datetime(df['date']).dt.to_period('M').
I use df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.000'), but it generates an error: TypeError: Passing PeriodDtype data is invalid. Use data.to_timestamp() instead. I also try with pd.to_datetime(df['date']).dt.strftime('%Y-%m'), it generates same error.
First idea is convert periods to timestamps by Series.to_timestamp and then use Series.dt.strftime:
print (df)
date
0 2014-09
print (df.dtypes)
date period[M]
dtype: object
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or simply add last values same for each value:
df['date'] = df['date'].dt.to_timestamp('s').dt.strftime('%Y-%m-%d %H:%M:%S').add('.000')
print (df)
date
0 2014-09-01 00:00:00.000
Or:
df['date'] = df['date'].dt.strftime('%Y-%m').add('-01 00:00:00.000')
print (df)
date
0 2014-09-01 00:00:00.000
use %f for milliseconds
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d %H:%M:%S.%f')
sample code is
df = pd.DataFrame({
'Date': ['2014-09-01 00:00:00.000']
})
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S.%f')
df
which gives you the following output
Date
0 2014-09-01
to convert 2014-09 in Period to 2014-09-01 00:00:00.000, we can do as follows
df = pd.DataFrame({
'date': ['2014-09-05']
})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['date'] = pd.to_datetime(df['date']).dt.to_period("M")
df['date'] = df['date'].dt.strftime('%Y-%m-01 00:00:00.000')
df
Try stripping the last 3 digits
print(pd.to_datetime(df['date']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')[0][:-3])
Output:
2014-09-01 00:00:00.000
In the event the other answers don't work, you could try
df.index = pd.DatetimeIndex(df.date).to_period('s')
df.index
Which should show the datetimeindex object with the frequency set as 's'