group by with year of the date - python

I have a date column in excel,with year_month_day format I want to extract only year of my date and group the column by year,but I got an error
df.index = pd.to_datetime(df[18], format='%y/%m/%d %I:%M%p')
df.groupby(by=[df.index.year])
18 is index of my date column
error=ValueError: time data '2022/04/23' does not match format '%y/%m/%d %I:%M%p' (match)
I don't know how can I fix it.

By the looks of it, the error message indicates that the format string you are using, %y/%m/%d %I:%M%p, doesn't match the format of the dates in your column.
It appears that your date format is YYYY/MM/DD, but the format string you're using is trying to parse it as YY/MM/DD %I:%M%p.
I think you should change the format string to %Y/%m/%d.
df.index = pd.to_datetime(df[18], format='%Y/%m/%d')
Then you can extract the year using the year attribute of the datetime object, and group by the year as you are doing.
Make sure your date column is formatted correctly. I provide here a code with which you can adjust the format of the dates.
import pandas as pd
df = pd.DataFrame({'date': ['2022/04/23', '2022/04/24', '2022/04/25']})
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')

Related

change YYYYDDMM to YYYYMMDD in python

I have a df with dates in a column converted to a datetime. the current format is YYYYDDMM. I need this converted to YYYYMMDD. I tried the below code but it does not change the format and still gives me YYYYDDMM. the end goal is to subtract 1 business day from the effective date but the format needs to be in YYYYMMDD to do this otherwise it subtracts 1 day from the M and not D. can someone help?
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'])
# Effective Date = 20220408 (4th Aug 2022 for clarity)
filtered_df['Effective Date new'] = filtered_df['Effective Date'].dt.strftime("%Y%m%d")
# Effective Date new = 20220408
desired output -- > Effective Date new = 20220804
By default, .to_datetime will interpret the input YYYYDDMM as YYYYMMDD, and therefore print the same thing with %Y%m%d as the format. You can fix this and make it properly parse days in the month greater than 12 by adding the dayfirst keyword argument.
filtered_df['Effective Date'] = pd.to_datetime(filtered_df['Effective Date'], dayfirst=True)
I like to use the datetime library for this purpose. You can use strptime to convert a string into the datetime object and strftime to convert your datetime object to the new string.
from datetime import datetime
def change_date(row):
row["Effective Date new"] = datetime.strptime(row["Effective Date"], "%Y%d%m").strftime("%Y%m%d")
return row
df2 = df.apply(change_date, axis=1)
The output df2 will have Effective Date new as your new column.

What is the correct format code for this type of date?

I have a numpy array (called dates) of dates (as strings) which I thought were in the form %Y-%m-%d %H:%M:%S. However, I get an error that I have dates such as 2021-05-11T00:00:00.0000000. Not sure where did that additional 'T' come and why is the time so precise.
I am trying to get rid of the time and only have the date.
My code is here:
dates = dataset.iloc[:,0].to_numpy()
newDates = []
for i in range(0,len(dates)):
newDates.append(datetime.strptime(dates[i], '%Y-%m-%dT%H:%M:%S.%f'))
newDates[i] = newDates[i].strftime('%Y-%m-%d')
dates = newDates
I get an error saying "ValueError: unconverted data remains: 0".
If I wrote instead
newDates.append(datetime.strptime(dates[i], '%Y-%m-%dT%H:%M:%S%f'))
I get an error "ValueError: unconverted data remains: .0000000".
In which format should the date be given?
If you have datetime in dataframe you can use pd.to_datetime and Series.dt.strftime for converting to desired format. pandas do all for you! (why convert values in dataframe to numpy.array.)
import pandas as pd
# example df
df = pd.DataFrame({'datetime': ['2021-05-11T00:00:00.0000000' ,
'2021-05-20T00:00:00.0000000' ,
'2021-06-24T00:00:00.0000000']})
df['datetime'] = pd.to_datetime(df['datetime']).dt.strftime('%Y-%m-%d')
print(df)
datetime
0 2021-05-11
1 2021-05-20
2 2021-06-24
Does this help? https://strftime.org/
The extra T can be seen after %Y-%m-%d
If you just want to get the date, just split the string like this.
date = date.split('T')[0]
this will first split the date string into to parts,
[2021-05-11','00:00:00.0000000]
then you can extract the first variable in the list by saving only index 0
then you are just left with
date = '2021-05-11'
dates = dataset.iloc[:,0].to_numpy()
newDates = []
for i in dates:
newDates.append(i.split('T')[0])
dates = newDates
assuming dates is a list

how to change date format where the source contain multiple format

How to change format date from 12-Mar-2022 to , format='%d/%m/%Y' in python
so the problem is I read data from the google sheet where in the data contain multiple format, some of them is 12/03/2022 and some of them 12-Mar-2022.
I tried using this got error of couse because doesn't match for 12-Mar-2022
defectData_x['date'] = pd.to_datetime(defectData_x['date'], format='%d/%m/%Y')
Appreciate your help
defectData_x['date1'] = defectData_x['date'].dt.strftime('%d/%m/%Y')
don forget date1's dtype is not datetime but object
so it is better using date column and date1 column both before make final result
after final result, you can drop date column
add my example:
import pandas as pd
df = pd.DataFrame(["12/03/2022", "12-Mar-2022"], columns=["date"])
df["date1"] = pd.to_datetime(df["date"])
df['date2'] = df['date1'].dt.strftime('%d/%m/%Y')

change timestamp format pandas

Example dataHow to change the timestamp format which has the format of '2019-12-16-12-40-53' and I want it to convert to '2019-12-16 12:40:53'
I tried
df['timeStamp'] = df['timeStamp'].apply(lambda x:
dt.datetime.strptime(x,'%Y%b%d:%H:%M:%S'))
and I got the error
ValueError: time data '2019-12-16-12-40-53' does not match format '%Y%b%d:%H:%M:%S'
I am attaching an image of data that I am using as timestamp.
Use to_datetime and change format with %m for match months in numbers with - between parts of datetimes:
df['timeStamp'] = pd.to_datetime(df['timeStamp'], format='%Y-%m-%d-%H-%M-%S')

Extract year from date column in dataframe having 'different date format" - python

I have date's in different date formats in date column of dataframe.
like this:
print(df['date'].head(15))
5/27/1972
12/15/1979
10/11/1972
9/15/1992
12/9/1980
0000-00-00
2000-00-00
1988-00-00
0000-00
2000-10-10
6/25/1976
6/6/1987
8/24/1987
0000-00-00
2000-00-00
How can I get year in seperate column in dataframe pandas ?
First convert the column into same format then extract the year from it.
import datetime as dt
df['date'] = df.date.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
df['year']=pd.to_datetime(df['date'], format='%m/%d/%Y').dt.year
Following approach help in solving the issue:
df['year'] = df.date.str.extract(r'([0-9][0-9][0-9][0-9])', expand=True)

Categories

Resources