Date field in SAS imported in Python pandas Dataframe [duplicate] - python

This question already has answers here:
convert a SAS datetime in Pandas
(2 answers)
Closed 6 years ago.
I have imported a SAS dataset in python dataframe using Pandas read_sas(path)
function. REPORT_MONTH is a column in sas dataset defined and saved as DATE9. format. This field is imported as float64 datatype in dataframe and having numbers which is basically a sas internal numbers for storing a date in a sas dataset. Now wondering how can I convert this originally a date field into a date field in dataframe?

I don't know how python stores dates, but SAS stores dates as numbers, counting the number of days from Jan 1, 1960. Using that you should be able to convert it in python to a date variable somehow.
I'm fairly certain that when data is imported to python the formats aren't honoured so in this case it's easy to work around this, in others it may not be.
There's probably some sort of function in python to create a date of Jan 1, 1960 and then increment by the number of days you get from the imported dataset to get the correct date.

Related

Convert date from xlsx dataset from YYYY.TEXT (e.g. 2012.916667) to a normal date format (e.g. 01/01/2012)

I've read in a xlsx file using pandas.read_excel and the dates on the dataset have come in like 2012.916667 for example. I can't figure out what the actual dates are as I don't have them so I'm not sure what the numbers mean. Anyone know how to convert them to normal dates? Thanks!
You can convert it in the regular pandas Timestamp data format like so
import pandas as pd
pd.to_datetime(2012.916667, unit='d', origin='1970-01-01')
# if the dates are loaded in a column, say, dates
pd.to_datetime(df['dates'], unit='d', origin='1970-01-01')
where the assumption is that the integer part is the number of days since the epoch (origin), and the decimal part is the percentage of day.
Since the data is coming from an excel file, the above assumptions are probably correct. Still, you should first get it confirmed from the data owner and use the appropriate parameters in the pandas function.

Datetime format of a Pandas dataframe column switching randomly [duplicate]

This question already has answers here:
Pandas: Datetime Improperly selecting day as month from date [duplicate]
(2 answers)
Closed 1 year ago.
I am using a dataframe which has a 'Date' column. I have used pd.to_datetime() to convert this column format to yyyy-mm-dd. However, this format is getting switched to some other format at intermittent dates in the dataframe (eg: yyyy-dd-mm).
Date
2021-02-01 <----- this is 2nd Jan, 2021
2021-01-21 <----- this is 21st Jan, 2021
Further, I have alto tried using the df['Date'].dt.strftime('%y-%m-%d'), but this too has not helped.
I request some guidance on the following points:
For any Date column, is it enough to just use pd.to_datetime() and be rest assured that all dates will be in correct format?
Or do I need to manually state the datetime format explicitly alongwith the pd.to_[enter image description here][1]datetime() feature?
The problem comes from how pandas parses dates.
When receiving 2021-02-01 it does not know if it is Feb 1st or Jan 2nd, so it applies its default decision rules: when the date starts with the year, the next field is the month, so resulting in Feb 1st.
This is not the case when parsing 2021-01-21, there is only one possible date, Jan 21st.
Take a look at to_datetime documentation, and its parameters day_first or format, to force a given format when there are different possible parsings

Creating a column in Dataframe [duplicate]

This question already has answers here:
Add column with constant value to pandas dataframe [duplicate]
(4 answers)
Closed 3 years ago.
Very, very new to python and have a question:
Can someone tell me how to create a new date column that is the date this data was collected. For example, if this is from a Jan 1.xlsx file, this column should be full of Jan 1.
I know how to create the column but how do I populate with Jan 1? Right now I only have to do this with one file but I am going to have to do this for all 31 files for January.
All help greatly appreciated...
After you instatiate dataframe (read file into pandas object). Just do:
df["dt"]="Jan 1"
It will populate whole column with this 1 value, for all rows

Finding the Max() Datetime Pandas Python

I have a question about using dates on pandas.
In the CSV I am importing (if I ordering it), I will find that the maximum date is 10/09/2019 18:22:00
Immediately after importing (still as object), the date that appears is 31/12/2018 12:05.
And if I convert in this way to date and time:
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'])
the value changes to: Timestamp('2019-12-08 18:40:00').
How do I get the maximum date I find into the CSV by filtering in Excel itself?
Today I'm using:
df['Data_Abertura_Processo'].max()
Am I wrong in converting or using max ()?
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'],format="%d/%m/%Y %H:%M:%S")
Make sure that your datetimes have all the same format.

whole column as datetime.date(Y,m,d) in python pandas [duplicate]

This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Keep only date part when using pandas.to_datetime
(13 answers)
Closed 4 years ago.
When I read data from database (or csv, etc.) pandas keep storing date-only data as timestamp and adds 00:00:00? Is there a way to enforce that the column is of type datetime.date and therefore no time is stored?
I tried this but it seems the 00:00:00 is more sticky than originally thought.
pd.to_datetime(mydf['date'], format='%Y-%m-%d').iloc[0]
Out[68]: Timestamp('2015-03-27 00:00:00')
Can i have instead of Timestamp the whole column be of type just datetime.date?

Categories

Resources