Converting numeric SAS dates to datetimes Pandas - python

I am currently trying to reproduce this: convert numeric sas date to datetime in Pandas
, but get the following error:
"Python int too large to convert to C long"
Here and example of my dates:
0 1.416096e+09
1 1.427069e+09
2 1.433635e+09
3 1.428624e+09
4 1.433117e+09
Name: dates, dtype: float64
Any ideas?

Here is a little hacky solution. If the date column is called 'date', try
df['date'] = pd.to_datetime(df['date'] - 315619200, unit = 's')
Here 315619200 is the number of seconds between Jan 1 1960 and Jan 1 1970.
You get
0 2004-11-15 00:00:00
1 2005-03-22 00:03:20
2 2005-06-05 23:56:40
3 2005-04-09 00:00:00
4 2005-05-31 00:03:20

Related

Issue with converting a pandas column from int64 to datetime64

I'm trying to convert a column of Year values from int64 to datetime64 in pandas. The column currently looks like
Year
2003
2003
2003
2003
2003
...
2021
2021
2021
2021
2021
However the data type listed when I use dataset['Year'].dtypes is int64.
That's after I used pd.to_datetime(dataset.Year, format='%Y') to convert the column from int64 to datetime64. How do I get around this?
You have to assign pd.to_datetime(df['Year'], format="%Y") to df['date']. Once you have done that you should be able to see convert from integer.
df = pd.DataFrame({'Year': [2000,2000,2000,2000,2000,2000]})
df['date'] = pd.to_datetime(df['Year'], format="%Y")
df
The output should be:
Year date
0 2000 2000-01-01
1 2000 2000-01-01
2 2000 2000-01-01
3 2000 2000-01-01
4 2000 2000-01-01
5 2000 2000-01-01
So essentially all you are missing is df['date'] = pd.to_datetime(df['Year'], format="%Y") from your code and it should be working fine with respect to converting.
The pd.to_datetime() will not just return the Year (as far as I understood from your question you wanted the year), if you want more information on what .to_date_time() returns, you can see the documentation.
I hope this helps.
You should be able to convert from an integer:
df = pd.DataFrame({'Year': [2003, 2022]})
df['datetime'] = pd.to_datetime(df['Year'], format='%Y')
print(df)
Output:
Year datetime
0 2003 2003-01-01
1 2022 2022-01-01

I have data frame and column with dates looks like

Date
-1.476329
-2.754683
-0.763295
-3.113292
-1.353446
when I am trying to convert these -ve float values into dd-mm-yyyy , I am getting the year as 1969 or something with almost same date in every row. But the year should be near to 2018-2020
Computers store time from 01 Jan 1970. Since you didn't gave insight about your algorithm I can only guess that when you convert your float values it uses this default value.
Maybe Datetime defaulting to 1970 in pandas will help ?
As your dates should have years near to 2018-2020, probably your Date column contains number of years relative to now (or another base date). As such, you can do:
Find out what base date the dates are relative to. For demo purpose, I set it to today's date:
base_date = pd.to_datetime('now').normalize()
Then, derive the calendar dates from your Date column by multiplying 1 year duration by np.timedelta64(1, 'Y') and add the base date:
import numpy as np
df['Date_Derived'] = base_date + df['Date'] * np.timedelta64(1, 'Y')
Result:
print(df)
Date Date_Derived
0 -1.476329 2020-01-05 18:45:56.610792
1 -2.754683 2018-09-25 20:56:40.793784
2 -0.763295 2020-09-22 05:05:36.323160
3 -3.113292 2018-05-17 21:26:33.794016
4 -1.353446 2020-02-19 15:56:09.543408
You can further truncate the time values by:
df['Date_Derived'] = df['Date_Derived'].dt.normalize()
Result:
print(df)
Date Date_Derived
0 -1.476329 2020-01-05
1 -2.754683 2018-09-25
2 -0.763295 2020-09-22
3 -3.113292 2018-05-17
4 -1.353446 2020-02-19

Convert date column formated as xx:xx.x

I have come across a CSV file that contains a date column formatted in the following manner: xx:xx.x, here's a couple of the data present in the column marked as date:
07:33.0
34:53.0
06:30.0
30:09.0
02:18.0
My question is what type of formatting is this? And how can I convert it to a proper date format using Python?
It looks like times without hours.
You can create timedeltas by add 0 hours by to_timedelta:
df['col'] = pd.to_timedelta('00:' + df['col'])
print (df)
col
0 0 days 00:07:33
1 0 days 00:34:53
2 0 days 00:06:30
3 0 days 00:30:09
4 0 days 00:02:18
Or convert to datetimes by to_datetime - there is added default date:
df['col'] = pd.to_datetime(df['col'], format='%M:%S.%f')
print (df)
col
0 1900-01-01 00:07:33
1 1900-01-01 00:34:53
2 1900-01-01 00:06:30
3 1900-01-01 00:30:09
4 1900-01-01 00:02:18

Need to convert epoch time to EST using pandas

I do have a dataframe like this -> df
timestamp values
0 1574288141 34
1 1574288241 23
2 1574288341 22
3 1574288441 10
Here timestamp has the epoch time. I want to convert this into a datetime in the format 2019-11-20 04:03:01. I would like to convert this into a EST date.
When I do
pd.to_datetime(df['timestamp'], unit='s')
I get the conversion and the required format but the time doesn't seem to be in EST. It is 4 hours ahead of EST.
I have tried to convert utc to Eastern using the code
pd.to_datetime(df['timestamp'], unit='s').tz_localize('utc').dt.tz_convert('US/Eastern')
But I am getting an error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
You should adding dt , since your input is series not index
pd.to_datetime(df.timestamp,unit='s').dt.tz_localize('utc').dt.tz_convert('US/Eastern')
Out[8]:
0 2019-11-20 17:15:41-05:00
1 2019-11-20 17:17:21-05:00
2 2019-11-20 17:19:01-05:00
3 2019-11-20 17:20:41-05:00
Name: timestamp, dtype: datetime64[ns, US/Eastern]

Sort date in string format in a pandas dataframe?

I have a dataframe like this, how to sort this.
df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']})
Date
0 Oct20
1 Nov19
2 Jan19
3 Sep20
4 Dec20
I familiar in sorting list of dates(string)
a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y"))
Any thoughts? Should i split it ?
First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc:
df = df.iloc[pd.to_datetime(df['Date'], format='%b%y').argsort()]
print (df)
Date
2 Jan19
1 Nov19
3 Sep20
0 Oct20
4 Dec20
Details:
print (pd.to_datetime(df['Date'], format='%b%y'))
0 2020-10-01
1 2019-11-01
2 2019-01-01
3 2020-09-01
4 2020-12-01
Name: Date, dtype: datetime64[ns]

Categories

Resources