I do have a dataframe like this -> df
timestamp values
0 1574288141 34
1 1574288241 23
2 1574288341 22
3 1574288441 10
Here timestamp has the epoch time. I want to convert this into a datetime in the format 2019-11-20 04:03:01. I would like to convert this into a EST date.
When I do
pd.to_datetime(df['timestamp'], unit='s')
I get the conversion and the required format but the time doesn't seem to be in EST. It is 4 hours ahead of EST.
I have tried to convert utc to Eastern using the code
pd.to_datetime(df['timestamp'], unit='s').tz_localize('utc').dt.tz_convert('US/Eastern')
But I am getting an error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
You should adding dt , since your input is series not index
pd.to_datetime(df.timestamp,unit='s').dt.tz_localize('utc').dt.tz_convert('US/Eastern')
Out[8]:
0 2019-11-20 17:15:41-05:00
1 2019-11-20 17:17:21-05:00
2 2019-11-20 17:19:01-05:00
3 2019-11-20 17:20:41-05:00
Name: timestamp, dtype: datetime64[ns, US/Eastern]
Related
I am have a dataframe loaded from a file containing a time series and values
datetime value_a
0 2019-08-19 00:00:00 194.32000000
1 2019-08-20 00:00:00 202.24000000
2 2019-08-21 00:00:00 196.55000000
3 2019-08-22 00:00:00 187.45000000
4 2019-08-23 00:00:00 190.36000000
After I try to convert first column to string, the hours minutes and seconds vanish.
datetime value_a
0 2019-08-19 194.32000000
1 2019-08-20 202.24000000
2 2019-08-21 196.55000000
3 2019-08-22 187.45000000
4 2019-08-23 190.36000000
Code snipped
df['datetime'] = df['datetime'].astype(str)
I kinda need the format %Y-%m-%d %H:%M:%S, because we are using it later.
What is wrong?
NOTE: I initially though that the issue is during conversion from object to datetime, however thanks to user #SomeDude, I have discovered that I am loosing h/m/s during to string conversion.
It seems like the error can be fixed by using different type conversion method with explicit format definition.
df['datetime'] = df['datetime'].dt.strftime("%Y-%m-%d %H:%M:%S")
This works.
You're saying "I don't like the default format".
Ok. So be explicit, include HMS in it when you re-format.
>>> df = pd.DataFrame([dict(datetime='2019-08-19 00:00:00', value_a=194.32)])
>>> df['datetime'] = pd.to_datetime(df.datetime)
>>>
>>> df['datetime'] = df.datetime.dt.strftime("%Y-%m-%d %H:%M:%S")
>>> df
datetime value_a
0 2019-08-19 00:00:00 194.32
I am trying to convert my column in a df into a time series. The dataset goes from March 23rd 2015-August 17th 2019 and the dataset looks like this:
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-04:00 19437.0
I am trying to convert the time column into a datetime series but it returns the column as an object. Here is the code:
data = pd.read_csv(data_path)
data.set_index('time', inplace=True)
data.index= pd.to_datetime(data.index)
data.index.dtype
data.index.dtype returns dtype('O'). I assume this is why when I try to index an element in time, it returns an error. For example, when I run this:
data.loc['2015']
It gives me this error
KeyError: '2015'
Any help or feedback would be appreciated. Thank you.
As commented, the problem might be due to the different timezones. Try passing utc=True to pd.to_datetime:
df['time'] = pd.to_datetime(df['time'],utc=True)
df['time']
Test Data
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-05:00 19437.0
Output:
0 2015-03-23 04:00:00+00:00
1 2015-03-24 05:00:00+00:00
Name: time, dtype: datetime64[ns, UTC]
And then:
df.set_index('time', inplace=True)
df.loc['2015']
gives
1day_active_users
time
2015-03-23 04:00:00+00:00 19687.0
2015-03-24 05:00:00+00:00 19437.0
I have a column in a dataframe with multiple date formats that need to be converted to datetime.
date amount
September 2018 15
Sep-18 20
The output should look like
date amount
2018-09-01 15
2018-09-01 20
Using pd.to_datetime(df['Month']) returns the error...
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-09-18 00:00:00
I am currently trying to reproduce this: convert numeric sas date to datetime in Pandas
, but get the following error:
"Python int too large to convert to C long"
Here and example of my dates:
0 1.416096e+09
1 1.427069e+09
2 1.433635e+09
3 1.428624e+09
4 1.433117e+09
Name: dates, dtype: float64
Any ideas?
Here is a little hacky solution. If the date column is called 'date', try
df['date'] = pd.to_datetime(df['date'] - 315619200, unit = 's')
Here 315619200 is the number of seconds between Jan 1 1960 and Jan 1 1970.
You get
0 2004-11-15 00:00:00
1 2005-03-22 00:03:20
2 2005-06-05 23:56:40
3 2005-04-09 00:00:00
4 2005-05-31 00:03:20
I have read-only access to a database that I query and read into a Pandas dataframe using pymssql. One of the variables contains dates, some of which are stored as midnight on 01 Jan 0001 (i.e. 0001-01-01 00:00:00.0000000). I've no idea why those dates should be included – as far as I know, they are not recognised as a valid date by SQL Server and they are probably due to some default data entry. Nevertheless, that's what I have to work with. This can be recreated as a dataframe as follows:
import numpy as np
import pandas as pd
tempDF = pd.DataFrame({ 'id': [0,1,2,3,4],
'date': ['0001-01-01 00:00:00.0000000',
'2015-05-22 00:00:00.0000000',
'0001-01-01 00:00:00.0000000',
'2015-05-06 00:00:00.0000000',
'2015-05-03 00:00:00.0000000']})
The dataframe looks like:
print(tempDF)
date id
0 0001-01-01 00:00:00.0000000 0
1 2015-05-22 00:00:00.0000000 1
2 0001-01-01 00:00:00.0000000 2
3 2015-05-06 00:00:00.0000000 3
4 2015-05-03 00:00:00.0000000 4
... with the following dtypes:
print(tempDF.dtypes)
date object
id int64
dtype: object
print(tempDF.dtypes)
However, I routinely convert date fields in the dataframe to datetime format using:
tempDF['date'] = pd.to_datetime(tempDF['date'])
However, by chance, I've noticed that the 0001-01-01 date is converted to 2001-01-01.
print(tempDF)
date id
0 2001-01-01 0
1 2015-05-22 1
2 2001-01-01 2
3 2015-05-06 3
4 2015-05-03 4
I realise that the dates in the original database are incorrect because SQL Server doesn't see 0001-01-01 as a valid date. But at least in the 0001-01-01 format, such missing data are easy to identify within my Pandas dataframe. However, when pandas.to_datetime() changes these dates so they lie within a feasible range, it is very easy to miss such outliers.
How can I make sure that pd.to_datetime doesn't interpret the outlier dates incorrectly?
If you provide a format, these dates will not be recognized:
In [92]: pd.to_datetime(tempDF['date'], format="%Y-%m-%d %H:%M:%S.%f", errors='coerce')
Out[92]:
0 NaT
1 2015-05-22
2 NaT
3 2015-05-06
4 2015-05-03
Name: date, dtype: datetime64[ns]
By default it will error, but by passing errors='coerce', they are converted to NaT values (coerce=True for older pandas versions).
The reason pandas converts these "0001-01-01" dates to "2001-01-01" without providing a format, is because this is the behaviour of dateutil:
In [32]: import dateutil
In [33]: dateutil.parser.parse("0001-01-01")
Out[33]: datetime.datetime(2001, 1, 1, 0, 0)