Pandas date_range: drop nanoseconds - python

I'm trying to create the following date range series with quarterly frequency:
import pandas as pd
pd.date_range(start = "1980", periods = 5, freq = 'Q-Dec')
Which returns
DatetimeIndex(['1980-03-31', '1980-06-30', '1980-09-30', '1980-12-31',
'1981-03-31'],
dtype='datetime64[ns]', freq='Q-DEC')
However, when I output this object to CSV I see the time portion of the series. e.g.
1980-03-31 00:00:00, 1980-06-30 00:00:00
Any idea how I can get rid of the time portion so when I export to a csv it just shows:
1980-03-31, 1980-06-30

You are not seeing nano-seconds, you are just seeing the time portion of the datetime that pandas has created for the DatetimeIndex. You can extract only the date portion with .date().
Code:
import pandas as pd
dr = pd.date_range(start="1980", periods=5, freq='Q-Dec')
print(dr[0].date())
Results:
1980-03-31

Related

python pandas converting UTC integer to datetime

I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40

How to convert a pandas series of integer timestamp to datetime (using fromtimestamp)? Error = cannot convert the series to <class 'int'>

I have a dataframe with timestamps in integer form. I would like to convert this to datetime, so I can plot the data using mplfinance.plot() (this gives the following error if I try to plot using the timestamps):
Expect data.index as DatetimeIndex
Below is a sample to show the problem:
import datetime as dt
data = {'timestamp': [1364774700, 1364775000,1364775900]}
df = pd.DataFrame (data, columns = ['timestamp'])
df['datetime'] = dt.datetime.fromtimestamp(df['timestamp'])
but this produces the error:
TypeError: cannot convert the series to <class 'int'>
Using fromtimestamp on a single timestamp value works fine.
Those integer timestamps are seconds since the Unix epoch ("Unix time"); use pandas.to_datetime with unti=second specified to convert df['timestamp'] to a DatetimeIndex:
import pandas as pd
df = pd.DataFrame({'timestamp': [1364774700, 1364775000, 1364775900]})
df = df.set_index(pd.to_datetime(df['timestamp'], unit='s'))
# timestamp
# timestamp
# 2013-04-01 00:05:00 1364774700
# 2013-04-01 00:10:00 1364775000
# 2013-04-01 00:25:00 1364775900
df['datetime'] = pd.to_datetime(df['timestamp'])
Should do the trick. My understanding is that pandas datetime and the datetime module are subtly different, and that when working with pandas you're better off using the pandas implementation.

Convert csv column from Epoch time to human readable minutes

I have a pandas.DataFrame indexed by time, as seen below. The time is in Epoch time. When I graph the second column these time values display along the x-axis. I want a more readable time in minutes:seconds.
In [13]: print df.head()
Time
1481044277379 0.581858
1481044277384 0.581858
1481044277417 0.581858
1481044277418 0.581858
1481044277467 0.581858
I have tried some pandas functions, and some methods for converting the whole column, I visited: Pandas docs, this question and the cool site.
I am using pandas 0.18.1
If you read your data with read_csv you can use a custom dateparser:
import pandas as pd
#example.csv
'''
Time,Value
1481044277379,0.581858
1481044277384,0.581858
1481044277417,0.581858
1481044277418,0.581858
1481044277467,0.581858
'''
def dateparse(time_in_secs):
time_in_secs = time_in_secs/1000
return datetime.datetime.fromtimestamp(float(time_in_secs))
dtype= {"Time": float, "Value":float}
df = pd.read_csv("example.csv", dtype=dtype, parse_dates=["Time"], date_parser=dateparse)
print df
You can convert an epoch timestamp to HH:MM with:
import datetime as dt
hours_mins = dt.datetime.fromtimestamp(1347517370).strftime('%H:%M')
Adding a column to your pandas.DataFrame can be done as:
df['H_M'] = pd.Series([dt.datetime.fromtimestamp(int(ts)).strftime('%H:%M')
for ts in df['timestamp']]).values

Python: How to filter a DataFrame of dates in Pandas by a particular date within a window of some days?

I have a DataFrame of dates and would like to filter for a particular date +- some days.
import pandas as pd
import numpy as np
import datetime
dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="D")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])
If I select lets say date 2009-08-03 and a window of 5 days, output would be similar to:
>>>
Power
2010-07-29 713.108020
2010-07-30 1055.109543
2010-07-31 951.159099
2010-08-01 1350.638983
2010-08-02 453.166697
2010-08-03 1066.859386
2010-08-04 1381.900717
2010-08-05 107.489179
2010-08-06 1195.945723
2010-08-07 1209.762910
2010-08-08 349.554492
N.B.: The original problem I am trying to accomplish is under Python: Filter DataFrame in Pandas by hour, day and month grouped by year
The function I created to accomplish this is filterDaysWindow and can be used as follows:
import pandas as pd
import numpy as np
import datetime
dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="D")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])
def filterDaysWindow(df, date, daysWindow):
"""
Filter a Dataframe by a date within a window of days
#type df: DataFrame
#param df: DataFrame of dates
#type date: datetime.date
#param date: date to focus on
#type daysWindow: int
#param daysWindow: Number of days to perform the days window selection
#rtype: DataFrame
#return: Returns a DataFrame with dates within date+-daysWindow
"""
dateStart = date - datetime.timedelta(days=daysWindow)
dateEnd = date + datetime.timedelta(days=daysWindow)
return df [dateStart:dateEnd]
df_filtered = filterDaysWindow(df, datetime.date(2010,8,3), 5)
print df_filtered

pandas save date in ISO format?

I'm trying to generate a Pandas DataFrame where date_range is an index. Then save it to a CSV file so that the dates are written in ISO-8601 format.
import pandas as pd
import numpy as np
from pandas import DataFrame, Series
NumberOfSamples = 10
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = DataFrame(index=dates)
df3.to_csv('dates.txt', header=False)
The current output to dates.txt is:
2013-01-01 00:00:00
2013-01-01 00:01:30
2013-01-01 00:03:00
2013-01-01 00:04:30
...................
I'm trying to get it to look like:
2013-01-01T00:00:00Z
2013-01-01T00:01:30Z
2013-01-01T00:03:00Z
2013-01-01T00:04:30Z
....................
Use datetime.strftime and call map on the index:
In [72]:
NumberOfSamples = 10
import datetime as dt
dates = pd.date_range('20130101',periods=NumberOfSamples,freq='90S')
df3 = pd.DataFrame(index=dates)
df3.index = df3.index.map(lambda x: dt.datetime.strftime(x, '%Y-%m-%dT%H:%M:%SZ'))
df3
Out[72]:
Empty DataFrame
Columns: []
Index: [2013-01-01T00:00:00Z, 2013-01-01T00:01:30Z, 2013-01-01T00:03:00Z, 2013-01-01T00:04:30Z, 2013-01-01T00:06:00Z, 2013-01-01T00:07:30Z, 2013-01-01T00:09:00Z, 2013-01-01T00:10:30Z, 2013-01-01T00:12:00Z, 2013-01-01T00:13:30Z]
Alternatively and better in my view (thanks to #unutbu) you can pass a format specifier to to_csv:
df3.to_csv('dates.txt', header=False, date_format='%Y-%m-%dT%H:%M:%SZ')
With pd.Index.strftime:
If you're sure that all your dates are UTC, you can hardcode the format:
df3.index = df3.index.strftime('%Y-%m-%dT%H:%M:%SZ')
which gives you 2013-01-01T00:00:00Z and so on. Note that the "Z" denotes UTC!
With pd.Timestamp.isoformat and pd.Index.map:
df3.index = df3.index.map(lambda timestamp: timestamp.isoformat())
This gives you 2013-01-01T00:00:00. If you attach a timezone to your dates first (e.g. by passing tz="UTC" to date_range), you'll get: 2013-01-01T00:00:00+00:00 which also conforms to ISO-8601 but is a different notation. This should work for any dateutil or pytz timezone, leaving no room for ambiguity when clocks switch from daylight saving to standard time.

Categories

Resources