Datetime format of a Pandas dataframe column switching randomly [duplicate] - python

This question already has answers here:
Pandas: Datetime Improperly selecting day as month from date [duplicate]
(2 answers)
Closed 1 year ago.
I am using a dataframe which has a 'Date' column. I have used pd.to_datetime() to convert this column format to yyyy-mm-dd. However, this format is getting switched to some other format at intermittent dates in the dataframe (eg: yyyy-dd-mm).
Date
2021-02-01 <----- this is 2nd Jan, 2021
2021-01-21 <----- this is 21st Jan, 2021
Further, I have alto tried using the df['Date'].dt.strftime('%y-%m-%d'), but this too has not helped.
I request some guidance on the following points:
For any Date column, is it enough to just use pd.to_datetime() and be rest assured that all dates will be in correct format?
Or do I need to manually state the datetime format explicitly alongwith the pd.to_[enter image description here][1]datetime() feature?

The problem comes from how pandas parses dates.
When receiving 2021-02-01 it does not know if it is Feb 1st or Jan 2nd, so it applies its default decision rules: when the date starts with the year, the next field is the month, so resulting in Feb 1st.
This is not the case when parsing 2021-01-21, there is only one possible date, Jan 21st.
Take a look at to_datetime documentation, and its parameters day_first or format, to force a given format when there are different possible parsings

Related

Why is my function parameter for Pandas to_datetime() being ignored?

I have a Pandas data frame with a column containing months and years. Unfortunately, the values are currently string objects not datetime objects; This means that if sorting by this column, the sort goes in numerical rather than chronological order. For example, 01 2020, 01 2021, 01 2022, 02 2020... etc.
I am attempting to convert the column to datetime objects using the pandas.to_datetime() function as documented here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
This works just fine, but unfortunately changes my format to yyyy-mm-dd (making up the day value since there isn't one, and arbitrarily setting it to 01).
I've used the format="" parameter, but it doesn't seem to change anything.
temp_df['Month'] = pd.to_datetime(temp_df['Month'], format="%m %Y") still results in the same date format, as if my format parameter had been %Y-%m-%d.
I've googled this a bunch and come across a number of articles and previous stack overflow posts that tell me how to solve this problem - but their solution is what I'm doing, using to_datetime(df, format). Oh, and this is in a Jupyter notebook if that matters.
Appreciate the help in advance!~

python method to convert string in format "11th November" into a date

I am using python in scrapy and collecting a bunch of dates that are stored on a web page in the form of text strings like "11th November" (no year is provided).
I was trying to use
startdate = '11th November'
datetime.strptime(startdate, '%d %B')
but I don't think it likes the 'th' and I get a
Value error: time data '11th November' does not match format '%d %B'
If I make a function to try to strip out the th, st, rd, nd from the days I figured it will strip out the same text from the month.
Is there a better way to approach turning this into a date format?
For my use, it ultimately needs to be in the ISO 8601 format YYYY-MM-DD
This is so that I can pipe it from scrapy to a database, and from that use it in a Google Spreadsheet for a javascript Google chart. I just mention this because there may be a better place to make the string-to-date change than trying to do it in python.
(As a secondary issue, I also need to figure how to add the right year to the date given that if it says 12th January that would mean Jan 2020 and not 2019. This will be based on a comparison to the date when the scrape runs. i.e. the date today.)
EDIT:
it turned out that the solution required the secondary issue to be addressed as well. Hence the choice of final answer to this question. If the secondary issue of the year was not addressed it defaulted to 1900 which was a problem.
Try this out -
import datetime
datetime_obj = datetime.datetime.strptime(re.sub(r"\b([0123]?[0-9])(st|th|nd|rd)\b",r"\1", startdate) + " " + str(datetime.datetime.now().year), "%d %B %Y")

Finding the Max() Datetime Pandas Python

I have a question about using dates on pandas.
In the CSV I am importing (if I ordering it), I will find that the maximum date is 10/09/2019 18:22:00
Immediately after importing (still as object), the date that appears is 31/12/2018 12:05.
And if I convert in this way to date and time:
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'])
the value changes to: Timestamp('2019-12-08 18:40:00').
How do I get the maximum date I find into the CSV by filtering in Excel itself?
Today I'm using:
df['Data_Abertura_Processo'].max()
Am I wrong in converting or using max ()?
df['Data_Abertura_Processo'] = pd.to_datetime(df['Data_Abertura_Processo'],format="%d/%m/%Y %H:%M:%S")
Make sure that your datetimes have all the same format.

Date field in SAS imported in Python pandas Dataframe [duplicate]

This question already has answers here:
convert a SAS datetime in Pandas
(2 answers)
Closed 6 years ago.
I have imported a SAS dataset in python dataframe using Pandas read_sas(path)
function. REPORT_MONTH is a column in sas dataset defined and saved as DATE9. format. This field is imported as float64 datatype in dataframe and having numbers which is basically a sas internal numbers for storing a date in a sas dataset. Now wondering how can I convert this originally a date field into a date field in dataframe?
I don't know how python stores dates, but SAS stores dates as numbers, counting the number of days from Jan 1, 1960. Using that you should be able to convert it in python to a date variable somehow.
I'm fairly certain that when data is imported to python the formats aren't honoured so in this case it's easy to work around this, in others it may not be.
There's probably some sort of function in python to create a date of Jan 1, 1960 and then increment by the number of days you get from the imported dataset to get the correct date.

Formatting the dates to desired format return invalid dates in python. How to control it?

I have a list of dates like:
['2013-04-06', '06/04/2013', '04/06/2013', 'Apr 06 2013']
I have used datetime.strftime() for converting the dates like ('%d/%m/%Y'), but when it converts the same dates with different formats like:
dd/mm/yyyy and mm/dd/yyyy
'06/04/2013' and '04/06/2013'
It returns the date as it is...
How can I solve my problem?
You can't, at least not with 100 % reliability. If your date formats are this hopelessly mixed, you will never be able to tell if 04/06/2013 is supposed to be April 6th or June 4th.
Your only chance is to take the more common variant and try that first. If that throws an error or returns an implausible date (like one in the future, if those are not permitted), try the next one.
You might also want to look into dateutil. It does its best to parse a date in any given format.
Numeric dates can be ambiguous. There are acouple of solutions;
Use an abbreviated name for the month; both "apr 06 2013" or "6 apr 2013". A downside is that the abbreviations for the month are locale dependent.
Stick to the international standard ISO 8601; YYYY-MM-DD or YYYYMMDD.
If you are getting dates from user input it is impossible to distingiush between MM-DD and DD-MM in all cases. But there are some things that can help. A number >12 is obsiously a day, not a month. If you know that the user is from Europe, the date will probably be DD-MM-YYYY. A person from the USA is more likely to use MM-DD-YYYY.

Categories

Resources