Python: Convert numeric value to date like SAS - python

I have a question. I have a set of numeric values that are a date, but apparently the date is wrongly formatted and coming out of SAS. For example, I have the value 5893 that is in SAS 19.02.1976 when formatted correctly. I want to achieve this in Python/PySpark. From what I've found until now, there is a function fromtimestamp.
However, when I do this, it gives a wrong date:
value = 5893
date = datetime.datetime.fromtimestamp(value)
print(date)
1970-01-01 02:38:13
Any proposals to get the correct date? Thank you! :-)
EDIT: And how would the code look like when this operation is imposed on a dataframe column rather than a variable?

The Epoch, as far as SAS is concerned, is 1st January 1960. The number you have (5893) is the number of elapsed days since that Epoch. Therefore:
from datetime import timedelta, date
print(date(1960, 1, 1) + timedelta(days=5893))
...will give you the desired result

import numpy as np
import pandas as pd
ser = pd.Series([19411.0, 19325.0, 19325.0, 19443.0, 19778.0])
ser = pd.to_timedelta(ser, unit='D') + pd.Timestamp('1960-1-1')

Related

How can I add a zero to dates in a string so all months are 2 characters? [duplicate]

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

How to convert dataframe dates into floating point numbers?

I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value.
The reason for this is that my dataframe contains a column with dates, like:
ID Date
519457 25/02/2020 10:03
519462 25/02/2020 10:07
519468 25/02/2020 10:12
... ...
And Numpy requires the format to be floating point numbers, as so:
ID Date
519457 43886.41875
519462 43886.42153
519468 43886.425
... ...
How can I make this change without having to modify the spreadsheet itself?
I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple.
I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.
The code I am using is the following
xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")
df["E_t"]=df["Date"].diff()
Any help or pointers would be really appreciated!
PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.
You can convert the date into the Unix timestamp. In python, if you have a datetime object in UTC, you can the timestamp() to get a UTC timestamp. This function returns the time since epoch for that datetime object.
Please see an example below-
from datetime import timezone
dt = datetime(2015, 10, 19)
timestamp = dt.replace(tzinfo=timezone.utc).timestamp()
print(timestamp)
1445212800.0
Please check the datetime module for more info.
I think you need:
#https://stackoverflow.com/a/9574948/2901002
#rewritten to vectorized solution
def excel_date(date1):
temp = pd.Timestamp(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return (delta.dt.days) + (delta.dt.seconds) / 86400
df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)
print (df)
ID Date
0 519457 43886.418750
1 519462 43886.421528
2 519468 43886.425000

Issue with datetime remaining at epoch

I've got a dataframe with one column filled with milliseconds that I've been able to convert somewhat into datetime format. The issue is that for two years worth of data, from 2017-2018, the time output remains at 1-1-1970. The output datetime looks like this:
27 1970-01-01 00:25:04.232399999
28 1970-01-01 00:25:04.232699999
29 1970-01-01 00:25:04.232999999
...
85264 1970-01-01 00:25:29.962799999
85265 1970-01-01 00:25:29.963099999
85266 1970-01-01 00:25:29.963399999
It seems to me that the milliseconds, which begin at 1504224299999 and end at 1529971499999, are getting added to the 10th hour of epoch and are not representing the true range that it should.
This is my code so far...
import pandas as pd
import MySQLdb
import datetime
from pandas import DataFrame
con = MySQLdb.connect(host='localhost',user='root',db='binance',passwd='abcde')
cur = con.cursor()
ms = pd.read_sql('SELECT close_time FROM btcusdt', con=con)
ms['close_time'].apply( lambda x: datetime.datetime.fromtimestamp(x/1000) )
date = pd.to_datetime(ms['close_time'])
print(date)
I'm not quite sure where I'm going wrong, so if anybody can tell me what I'm doing stupidly it'd be greatly appreciated.
If you need to apply a function that doesn't support your argument directly, you can apply it element wise using dummy function lambda.
Also, you need to assign back to your original panda series to overwrite it, use:
ms['close_time'] = ms['close_time'].apply( lambda x: datetime.datetime.fromtimestamp(x/1000) )
If you want to use pandas.to_datetime directly. use:
pd.to_datetime(ms['close_time'], unit = 'ms')
PS. There might be difference in datetime obtained from these two methods

How to set a variable to be "Today's" date in Python/Pandas

I am trying to set a variable to equal today's date.
I looked this up and found a related article:
Set today date as default value in the model
However, this didn't particularly answer my question.
I used the suggested:
dt.date.today
But after
import datetime as dt
date = dt.date.today
print date
<built-in method today of type object at 0x000000001E2658B0>
Df['Date'] = date
I didn't get what I actually wanted which as a clean date format of today's date...in Month/Day/Year.
How can I create a variable of today's day in order for me to input that variable in a DataFrame?
You mention you are using Pandas (in your title). If so, there is no need to use an external library, you can just use to_datetime
>>> pandas.to_datetime('today').normalize()
Timestamp('2015-10-14 00:00:00')
This will always return today's date at midnight, irrespective of the actual time, and can be directly used in pandas to do comparisons etc. Pandas always includes 00:00:00 in its datetimes.
Replacing today with now would give you the date in UTC instead of local time; note that in neither case is the tzinfo (timezone) added.
In pandas versions prior to 0.23.x, normalize may not have been necessary to remove the non-midnight timestamp.
If you want a string mm/dd/yyyy instead of the datetime object, you can use strftime (string format time):
>>> dt.datetime.today().strftime("%m/%d/%Y")
# ^ note parentheses
'02/12/2014'
Using pandas: pd.Timestamp("today").strftime("%m/%d/%Y")
pd.datetime.now().strftime("%d/%m/%Y")
this will give output as '11/02/2019'
you can use add time if you want
pd.datetime.now().strftime("%d/%m/%Y %I:%M:%S")
this will give output as '11/02/2019 11:08:26'
strftime formats
You can also look into pandas.Timestamp, which includes methods like .now and .today.
Unlike pandas.to_datetime('now'), pandas.Timestamp.now() won't default to UTC:
import pandas as pd
pd.Timestamp.now() # will return California time
# Timestamp('2018-12-19 09:17:07.693648')
pd.to_datetime('now') # will return UTC time
# Timestamp('2018-12-19 17:17:08')
i got the same problem so tried so many things
but finally this is the solution.
import time
print (time.strftime("%d/%m/%Y"))
simply just use pd.Timestamp.now()
for example:
input: pd.Timestamp.now()
output: Timestamp('2022-01-12 14:43:05.521896')
I know all you want is Timestamp('2022-01-12') you don't anything after
thus we could use replace to remove hour, minutes , second and microsecond
here:
input: pd.Timestamp.now().replace(hour=0, minute=0, second=0, microsecond=0)
output: Timestamp('2022-01-12 00:00:00')
but looks too complicated right, here is a simple way use normalize
input: pd.Timestamp.now().normalize()
output: Timestamp('2022-01-12 00:00:00')
Easy solution in Python3+:
import time
todaysdate = time.strftime("%d/%m/%Y")
#with '.' isntead of '/'
todaysdate = time.strftime("%d.%m.%Y")
import datetime
def today_date():
'''
utils:
get the datetime of today
'''
date=datetime.datetime.now().date()
date=pd.to_datetime(date)
return date
Df['Date'] = today_date()
this could be safely used in pandas dataframes.
There are already quite a few good answers, but to answer the more general question about "any" period:
Use the function for time periods in pandas. For Day, use 'D', for month 'M' etc.:
>pd.Timestamp.now().to_period('D')
Period('2021-03-26', 'D')
>p = pd.Timestamp.now().to_period('D')
>p.to_timestamp().strftime("%Y-%m-%d")
'2021-03-26'
note: If you need to consider UTC, you can use: pd.Timestamp.utcnow().tz_localize(None).to_period('D')...
From your solution that you have you can use:
import pandas as pd
pd.to_datetime(date)
using the date variable that you use

Converting date between DD/MM/YYYY and YYYY-MM-DD?

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

Categories

Resources