Date Formatting Problem in pandas Dataframe

Date Formatting Problem in pandas Dataframe - python

I have a Date column in my Dataframe, when I display the dates, The Dates format are merged, and are in random format.How to put them in right format? Like in dd/mm/yyyy

This is pseudo code since you did not gave us your code. It assumed that the column date of a dataframe df is correctly formatted as datetime.
You can use the vectorized datetime function strftime() with (see the docs):
df['date'].dt.strftime("%d/%m/%Y")
When you want to save the changes of the format, you need to assign it again to the date column, like this
df['date'] = df['date'].dt.strftime("%d/%m/%Y")

Related

How do I import a column as datetime.date?

I have a dataset in CSV which first column are dates (not datetimes, just dates).
The CSV is like this:
date,text
2005-01-01,"FOO-BAR-1"
2005-01-02,"FOO-BAR-2"
If I do this:
df = pd.read_csv('mycsv.csv')
I get:
print(df.dtypes)
date object
text object
dtype: object
How can I get column date by datetime.date?

Use:
df = pd.read_csv('mycsv.csv', parse_dates=[0])
This way the initial column will be of native pandasonic datetime type,
which is used in Pandas much more often than pythonic datetime.date.
It is a more natural approach than conversion of the column in question
after you read the DataFrame.

You can use pd.to_datetime function available in pandas.
For example in a dataset about scores of a cricket match. I can convert the Matchdate column to datatime object by applying pd.to_datetime function based on the data time format given in the data. ( Refer https://www.w3schools.com/python/python_datetime.asp to assign commands based on your data time formating )
cricket["MatchDate"]=pd.to_datetime(cricket["MatchDate"], format= "%m-%d-%Y")

Converting dates when importing from CSV, OutOfBoundsDatetime: Out of bounds nanosecond timestamp. Pandas

I'm importing data from a csv, and I'm trying to set a specific date to today's date.
Data in the csv if formatted this way:
All data in that column are dates and are formatted exactly the same. I read in the data with df = pd.read_csv(r'<filapath.csv>) at the moment.
Then this is run to convert all instances of '7/21/2020' into today's date:
df['filedate'] = np.where(pd.to_datetime(df['filedate']) == '7/21/2020', pd.Timestamp('now').floor(freq='d'),df['filedate'])
I receive this error: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-14 00:00:00
I don't want to use errors='coerce' because the column will always be 100% populated with real dates, and I will later need to filter the dataframe by date. There seems to be some "ghost" precision in the csv data I can't see. I cannot modify the csv column in this case and I can't use any packages outside of pandas and numpy.

...or alternatively .loc:
df.loc[df['filedate'] == '7/21/2020', 'filedate'] = pd.Timestamp('now').floor(freq='d')

Use .replace() function.
df['filedate'].replace({'7/21/2020':pd.Timestamp('now').floor(freq='d')})

Convert a series of dates in format YYYYMMDD in a dataframe of massive data

hi i´m trying to convert to date one field in a pd dataframe that is date but formated as YYYYMMDD
i have tried
pd.to_datetime('20180331').strftime('%Y:%m:%d')
but it doesn´t work for a full series of data, only for 1 case, i have a 500.000 lines data set so a lambda function wouldn´t be so fast.
thanks for the help

assuming your column is df['col']:
pd.to_datetime(df['col'], format = '%Y%m%d')
documentation

Faster solution for date formatting

I am trying to change the format of the date in a pandas dataframe.
If I check the date in the beginning, I have:
df['Date'][0]
Out[158]: '01/02/2008'
Then, I use:
df['Date'] = pd.to_datetime(df['Date']).dt.date
To change the format to
df['Date'][0]
Out[157]: datetime.date(2008, 1, 2)
However, this takes a veeeery long time, since my dataframe has millions of rows.
All I want to do is change the date format from MM-DD-YYYY to YYYY-MM-DD.
How can I do it in a faster way?

You should first collapse by Date using the groupby method to reduce the dimensionality of the problem.
Then you parse the dates into the new format and merge the results back into the original DataFrame.
This requires some time because of the merging, but it takes advantage from the fact that many dates are repeated a large number of times. You want to convert each date only once!
You can use the following code:
date_parser = lambda x: pd.datetime.strptime(str(x), '%m/%d/%Y')
df['date_index'] = df['Date']
dates = df.groupby(['date_index']).first()['Date'].apply(date_parser)
df = df.set_index([ 'date_index' ])
df['New Date'] = dates
df = df.reset_index()
df.head()
In my case, the execution time for a DataFrame with 3 million lines reduced from 30 seconds to about 1.5 seconds.

I'm not sure if this will help with the performance issue, as I haven't tested with a dataset of your size, but at least in theory, this should help. Pandas has a built in parameter you can use to specify that it should load a column as a date or datetime field. See the parse_dates parameter in the pandas documentation.
Simply pass in a list of columns that you want to be parsed as a date and pandas will convert the columns for you when creating the DataFrame. Then, you won't have to worry about looping back through the dataframe and attempting the conversion after.
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=[0,2])
The above example would try to parse the 1st and 3rd (zero-based) columns as dates.
The type of each resulting column value will be a pandas timestamp and you can then use pandas to print this out however you'd like when working with the dataframe.

Following a lead at #pygo's comment, I found that my mistake was to try to read the data as
df['Date'] = pd.to_datetime(df['Date']).dt.date
This would be, as this answer explains:
This is because pandas falls back to dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied (this is much more flexible, but also slower).
As you have shown above, you can improve the performance by supplying a format string to to_datetime. Or another option is to use infer_datetime_format=True
When using any of the date parsers from the answers above, we go into the for loop. Also, when specifying the format we want (instead of the format we have) in the pd.to_datetime, we also go into the for loop.
Hence, instead of doing
df['Date'] = pd.to_datetime(df['Date'],format='%Y-%m-%d')
or
df['Date'] = pd.to_datetime(df['Date']).dt.date
we should do
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y').dt.date
By supplying the current format of the data, it is read really fast into datetime format. Then, using .dt.date, it is fast to change it to the new format without the parser.
Thank you to everyone who helped!

Subset of Dataframe based on substring (python)

I have a pandas dataframe called Data filled with dates. An example date might look like: "2015-05-10 23:45:00". I want to look at the data in January only, so I want:
Data= Data[:][5:7]=="01"
This doesn't work though.
TDLR, wondering how to find get subset of a dataframe based on a substring.
Thanks!

Consider using the bracketed filter with datetime's month value. But first, you will need to convert string dates to datetime which can be handled with panda's to_datetime():
import datetime as dt
...
Data['yourdatetimecolumn'] = pd.to_datetime(Data['yourdatetimecolumn'])
JanData = Data[Data['yourdatetimecolumn'].dt.month==1]

since your query is with regards to Dates, want to start first by looking this up? and give it a try may be..
Parse a Pandas column to Datetime

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Date Formatting Problem in pandas Dataframe - python

I have a Date column in my Dataframe, when I display the dates, The Dates format are merged, and are in random format.How to put them in right format? Like in dd/mm/yyyy

Related

How do I import a column as datetime.date?

Converting dates when importing from CSV, OutOfBoundsDatetime: Out of bounds nanosecond timestamp. Pandas

Convert a series of dates in format YYYYMMDD in a dataframe of massive data

Faster solution for date formatting

Subset of Dataframe based on substring (python)

Categories

Resources