defining month first dateformat in pandas? - python

How can i define month first dateformat in pandas?
for date first format I define like using dayfirst attribute;
dateCols = ['Document Date']
data = pd.read_excel(os.path.join(delivery_path, f), parse_dates=dateCols,
dayfirst=True, sheet_name='Refined', skiprows=1)
There is no monthfirst attribute. How should I define that when reading the file? And also what is the default dateformat panda uses when reading date columns?
eg: October 1st =10/01/2019

I don't understand your date column is like this October 1st=10/01/2019 or this 10/01/2019 if your column is October 1st=10/01/2019
import pandas as pd
def clean(date_column):
date = str(date_column).split('=')
return date[1]
data[dateCols] = pd.to_datetime(data[dateCols].apply(clean),format='%m/%d/%Y')
if 10/01/2019
data[dateCols] = pd.to_datetime(data[dateCols],format='%m/%d/%Y')
for the format you can learn more about from here http://strftime.org/

Related

group by with year of the date

I have a date column in excel,with year_month_day format I want to extract only year of my date and group the column by year,but I got an error
df.index = pd.to_datetime(df[18], format='%y/%m/%d %I:%M%p')
df.groupby(by=[df.index.year])
18 is index of my date column
error=ValueError: time data '2022/04/23' does not match format '%y/%m/%d %I:%M%p' (match)
I don't know how can I fix it.
By the looks of it, the error message indicates that the format string you are using, %y/%m/%d %I:%M%p, doesn't match the format of the dates in your column.
It appears that your date format is YYYY/MM/DD, but the format string you're using is trying to parse it as YY/MM/DD %I:%M%p.
I think you should change the format string to %Y/%m/%d.
df.index = pd.to_datetime(df[18], format='%Y/%m/%d')
Then you can extract the year using the year attribute of the datetime object, and group by the year as you are doing.
Make sure your date column is formatted correctly. I provide here a code with which you can adjust the format of the dates.
import pandas as pd
df = pd.DataFrame({'date': ['2022/04/23', '2022/04/24', '2022/04/25']})
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')

change YYYYDDMM to YYYYMMDD in python

I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.

Bad datetime conversion in pandas when a csv file it's opened

I have a simple csv in which there are a Date and Activity column like this:
and when I open it with pandas and I try to convert the Date column with pd.to_datetime its change the date. When there are a change of month like this
Its seems that pandas change the day by the month or something like that:
The format of date that I want it's dd-mm-yyyy or yyyy-mm-dd.
This it's the code that I using:
import pandas as pd
dataset = pd.read_csv(directory + "Time 2020 (Activities).csv", sep = ";")
dataset[["Date"]] = dataset[["Date"]].apply(pd.to_datetime)
How can I fix that?
You could specify the date format in the pd.to_datetime parameters:
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y-%m-%d')

Making both day-first and month-first dates in a csv file day-first

I have a csv file that has a column of dates. The dates are in order of month - so January comes first, then Feb, and so on. The problem is some of the dates are in mm/dd/yyyy format and others in dd/mm/yyyy format. Here's what it looks like.
Date
01/08/2005
01/12/2005
15/01/2005
19/01/2005
22/01/2005
26/01/2005
29/01/2005
03/02/2005
05/02/2005
...
I would like to bring all of them to the same format (dd/mm/yyyy)
I am using Python and pandas to read and edit the csv file. I tried using Excel to manually change the date formats using the built-in formatting tools but it seems impossible with the large number of rows. I'm thinking of using regex but I'm not quite sure how to distinguish between month-first and day-first.
# here's what i have so far
date = df.loc[i, 'Date']
pattern = r'\d\d/\d\d/\d\d'
match = re.search(pattern, date)
if match:
date_items = date.split('/')
day = date_items[1]
month = date_items[0]
year = date_items[2]
new_date = f'{dd}/{mm}/{year}'
df.loc[i, 'Date'] = new_date
I want the csv to have a uniform date format in the end.
In short: you can't!
There's no way for you to know if 01/02/2019 is Jan 2nd or Feb 1st!
Same goes for other dates in your examples such as:
01/08/2005
01/12/2005
03/02/2005
05/02/2005

Extract year from date column in dataframe having 'different date format" - python

I have date's in different date formats in date column of dataframe.
like this:
print(df['date'].head(15))
5/27/1972
12/15/1979
10/11/1972
9/15/1992
12/9/1980
0000-00-00
2000-00-00
1988-00-00
0000-00
2000-10-10
6/25/1976
6/6/1987
8/24/1987
0000-00-00
2000-00-00
How can I get year in seperate column in dataframe pandas ?
First convert the column into same format then extract the year from it.
import datetime as dt
df['date'] = df.date.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
df['year']=pd.to_datetime(df['date'], format='%m/%d/%Y').dt.year
Following approach help in solving the issue:
df['year'] = df.date.str.extract(r'([0-9][0-9][0-9][0-9])', expand=True)

Categories

Resources