pandas save date in ISO format? - python

I'm trying to generate a Pandas DataFrame where date_range is an index. Then save it to a CSV file so that the dates are written in ISO-8601 format.
import pandas as pd
import numpy as np
from pandas import DataFrame, Series
NumberOfSamples = 10
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = DataFrame(index=dates)
df3.to_csv('dates.txt', header=False)
The current output to dates.txt is:
2013-01-01 00:00:00
2013-01-01 00:01:30
2013-01-01 00:03:00
2013-01-01 00:04:30
...................
I'm trying to get it to look like:
2013-01-01T00:00:00Z
2013-01-01T00:01:30Z
2013-01-01T00:03:00Z
2013-01-01T00:04:30Z
....................

Use datetime.strftime and call map on the index:
In [72]:
NumberOfSamples = 10
import datetime as dt
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = pd.DataFrame(index=dates)
df3.index = df3.index.map(lambda x: dt.datetime.strftime(x, '%Y-%m-%dT%H:%M:%SZ'))
df3
Out[72]:
Empty DataFrame
Columns: []
Index: [2013-01-01T00:00:00Z, 2013-01-01T00:01:30Z, 2013-01-01T00:03:00Z, 2013-01-01T00:04:30Z, 2013-01-01T00:06:00Z, 2013-01-01T00:07:30Z, 2013-01-01T00:09:00Z, 2013-01-01T00:10:30Z, 2013-01-01T00:12:00Z, 2013-01-01T00:13:30Z]
Alternatively and better in my view (thanks to #unutbu) you can pass a format specifier to to_csv:
df3.to_csv('dates.txt', header=False, date_format='%Y-%m-%dT%H:%M:%SZ')

With pd.Index.strftime:
If you're sure that all your dates are UTC, you can hardcode the format:
df3.index = df3.index.strftime('%Y-%m-%dT%H:%M:%SZ')
which gives you 2013-01-01T00:00:00Z and so on. Note that the "Z" denotes UTC!
With pd.Timestamp.isoformat and pd.Index.map:
df3.index = df3.index.map(lambda timestamp: timestamp.isoformat())
This gives you 2013-01-01T00:00:00. If you attach a timezone to your dates first (e.g. by passing tz="UTC" to date_range), you'll get: 2013-01-01T00:00:00+00:00 which also conforms to ISO-8601 but is a different notation. This should work for any dateutil or pytz timezone, leaving no room for ambiguity when clocks switch from daylight saving to standard time.

Related

convert yyyy-mm-dd to mmm-yy in dataframe python

I am trying to convert the way month and year is presented.
I have dataframe as below
Date
2020-01-31
2020-04-30
2021-05-05
and I want to convert it in the way like month and year.
The output that I am expecting is
Date
Jan-20
Apr-20
May-21
I tried to do it with datetime but it doesn't work.
pd.to_datetime(pd.Series(df['Date'),format='%mmm-%yy')
Use .dt.strftime() to change the display format. %b-%y is the format string for Mmm-YY:
df.Date = pd.to_datetime(df.Date).dt.strftime('%b-%y')
# Date
# 0 Jan-20
# 1 Apr-20
# 2 May-21
Or if Date is the index:
df.index = pd.to_datetime(df.index).dt.strftime('%b-%y')
import pandas as pd
date_sr = pd.to_datetime(pd.Series("2020-12-08"))
change_format = date_sr.dt.strftime('%b-%Y')
print(change_format)
reference https://docs.python.org/3/library/datetime.html
%Y-%m-%d changed to ('%b-%y')
import datetime
df['Date'] = df['Date'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d').strftime('%b-%y'))
# reference https://docs.python.org/3/library/datetime.html
# %Y-%m-%d changed to ('%b-%y')

How to convert a pandas series of integer timestamp to datetime (using fromtimestamp)? Error = cannot convert the series to <class 'int'>

I have a dataframe with timestamps in integer form. I would like to convert this to datetime, so I can plot the data using mplfinance.plot() (this gives the following error if I try to plot using the timestamps):
Expect data.index as DatetimeIndex
Below is a sample to show the problem:
import datetime as dt
data = {'timestamp': [1364774700, 1364775000,1364775900]}
df = pd.DataFrame (data, columns = ['timestamp'])
df['datetime'] = dt.datetime.fromtimestamp(df['timestamp'])
but this produces the error:
TypeError: cannot convert the series to <class 'int'>
Using fromtimestamp on a single timestamp value works fine.
Those integer timestamps are seconds since the Unix epoch ("Unix time"); use pandas.to_datetime with unti=second specified to convert df['timestamp'] to a DatetimeIndex:
import pandas as pd
df = pd.DataFrame({'timestamp': [1364774700, 1364775000, 1364775900]})
df = df.set_index(pd.to_datetime(df['timestamp'], unit='s'))
# timestamp
# timestamp
# 2013-04-01 00:05:00 1364774700
# 2013-04-01 00:10:00 1364775000
# 2013-04-01 00:25:00 1364775900
df['datetime'] = pd.to_datetime(df['timestamp'])
Should do the trick. My understanding is that pandas datetime and the datetime module are subtly different, and that when working with pandas you're better off using the pandas implementation.

How to convert date time format as YYYY-MM-DD HH:MM:SS from an confused datetime format using python?

I am getting an date and time format as follows,
2019-1-31.23.54. 53. 207000000
2019-1-31.23.51. 27. 111000000
I need to convert it as follows using python pandas,
2019-01-31 23:54:53
2019-01-31 23:51:27
How can get the expected result.
I tried to delete the last micro second value by convert the above text to csv based on space separated. Then delete the last column which contains microsecond.
But not able to convert "2019-1-31.23.54." part.
Tried code,
df = pd.read_csv('file:///C:/prod/orderip.txt',sep='\s+',header=None)
df.columns = [ 'DateTime', 'Extra1','Extra2']
df.to_csv('C:/prod/data_out2.csv',index=False)
df = df.drop('Extra1', 1)
df = df.drop('Extra2', 1)
I need the DateTime column as follows,
2019-01-31 23:54:53
2019-01-31 23:51:27
The standard datetime.strptime should work in this case, just that the last 9 digits should be reduced to 6, since microseconds can only have 6 digits
import datetime
print(datetime.datetime.strptime('2019-1-31.23.54. 53. 207000', '%Y-%m-%d.%H.%M. %S. %f'))
The output will be
2019-01-31 23:54:53.207000
Use pd.to_datetime to convert to datetime format of your choice.
Ex:
import pandas as pd
df = pd.read_csv(filename,sep='\s+',header=None)
df.columns = [ 'DateTime', 'Extra1','Extra2']
df.drop(['Extra2'], inplace=True, axis=1)
df["DateTime"] = pd.to_datetime(df["DateTime"] + df['Extra1'].astype(int).astype(str), format="%Y-%m-%d.%H.%M.%S")
df.drop(['Extra1'], inplace=True, axis=1)
print(df)
df.to_csv('C:/prod/data_out2.csv',index=False)
#or using df.pop
#df["DateTime"] = pd.to_datetime(df["DateTime"] + df.pop('Extra1').astype(int).astype(str), format="%Y-%m-%d.%H.%M.%S")
#df.to_csv(filename_1,index=False)
Output:
DateTime
0 2019-01-31 23:54:53
1 2019-01-31 23:51:27
You can try first to converted to a standard datetime format with pd.to_datetime
>>> print(dates)
['2019-1-31.23.54.', '2019-1-31.23.51.']
>>> pd.to_datetime(dates, format='%Y-%m-%d.%H.%M.')
DatetimeIndex(['2019-01-31 23:54:00', '2019-01-31 23:51:00'], dtype='datetime64[ns]', freq=None)

pandas.to_datetime with different length date strings

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

Pandas 0.15 DataFrame: Remove or reset time portion of a datetime64

I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:
2014-06-30 21:50:00
I simply want to either remove the time or set the time to midnight:
2014-06-30 00:00:00
What is the easiest way of doing this?
Pandas has a builtin function pd.datetools.normalize_date for that purpose:
df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)
It's implemented in Cython and does the following:
if PyDateTime_Check(dt):
return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
return datetime(dt.year, dt.month, dt.day)
else:
raise TypeError('Unrecognized type: %s' % type(dt))
Use dt methods, which is vectorized to yield faster results.
# There are better ways of converting it in to datetime column.
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date
pd.datetools.normalize_date has been deprecated. Use df['date_col'] = df['date_col'].dt.normalize() instead.
See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html
I can think of two ways, setting or assigning to a new column just the date() attribute, or calling replace on the datetime object and passing param hour=0, minute=0:
In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df
datetime
0 2014-06-30
Out[107]:
datetime
0 2014-06-30
Since pd.datetools.normalize_date has been deprecated and you are working with the datetime64 data type, use:
df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))
This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.
df.your_date_col = df.your_date_col.dt.normalize()
The fastest way I have found to strip everything but the date is to use the underlying Numpy structure of pandas Timestamps.
import pandas as pd
dates = pd.to_datetime(['1990-1-1 1:00:11',
'1991-1-1',
'1999-12-31 12:59:59.999'])
dates
DatetimeIndex(['1990-01-01 01:00:11', '1991-01-01 00:00:00',
'1999-12-31 12:59:59.999000'],
dtype='datetime64[ns]', freq=None)
dates = dates.astype(np.int64)
ns_in_day = 24*60*60*np.int64(1e9)
dates //= ns_in_day
dates *= ns_in_day
dates = dates.astype(np.dtype('<M8[ns]'))
dates = pd.Series(dates)
dates
0 1990-01-01
1 1991-01-01
2 1999-12-31
dtype: datetime64[ns]
This might not work when data have timezone information.

Categories

Resources