Convert column with month and year ("August 2020"...) to datetime - python

I have dataframe which contains one column of month and year as string :
>>>time index value
January 2021 y 5
January 2021 v 8
May 2020 y 25
June 2020 Y 13
June 2020 x 11
June 2020 v 10
...
I would like to change the column "time" into datetime format so I can sort the table by chronological order.
Is thery any way to do it when the time is string with month name and number?
#edit:
when I do :
result_Table['time']=pd.to_datetime(result_Table['time'],format='%Y-%m-%d')
I recieve error:
ValueError: time data January 2021 doesn't match format specified

Sample dataframe:
df=pd.DataFrame({'time':['January 2021','May 2020','June 2020']})
If you want to specify the format parameter then that should be '%B %Y' instead of '%Y-%m-%d':
df['time']=pd.to_datetime(df['time'],format='%B %Y')
#OR
#you can also simply use:
#df['time']=pd.to_datetime(df['time'])
output of df:
time
0 2021-01-01
1 2020-05-01
2 2020-06-01
For more info regarding format codes visit here

Related

Change datatype of a column that strictly only holds year in the format yyyy

df.assign(Year=pd.to_datetime(df.Year, format='%Y')).set_index('Year')
Consider the df['Year'] has n rows with data listed in YYYY format. How can I change this to date-time format without adding month and day, the above code converts YYYY to 2015-01-01.
You might be looking for a Period:
df.assign(year=pd.to_datetime(df['Year'],format='%Y').dt.to_period('Y')).set_index('year')
Or with PeriodIndex:
df.assign(year=pd.PeriodIndex(df['Year'], freq='Y')).set_index('year')
extract the year using dt
# Year with capital Y is column in DF
# year with small letter y is a calculated year and is index
df=df.assign(year=pd.to_datetime(df['Year'],format='%Y').dt.year).set_index('year')
Year height
year
2014 2014 175
2014 2014 180
2014 2014 160

Select last day of ISO-8601 week (and retain only year and week as object)

We have this df:
df = pd.DataFrame({
'date': [pd.Timestamp('2020-12-26'), # week 52 of year 2020
pd.Timestamp('2020-12-27'), # last day of week 52 of year 2020
pd.Timestamp('2021-03-10'), # week 10 of year 2021
pd.Timestamp('2022-01-03'), # first day of week 1 of year 2022
pd.Timestamp('2022-01-09')], # last day of week 1 of year 2022
'value' : [15, 15.5, 26, 36, 36.15]
})
We want a new df that looks so:
date value
0 202052 15.50
1 202201 36.15
In other words we need to:
convert 'date' to format year/week number (and store result as
an object)
select only rows which date correspond to the last
day of the week
Note both (1) and (2) need to be done following ISO-8601 definition of weeks. Actual dataset has thousands of rows.
How do we do this?
You can work directly on the series by using the dt call on the column to transform the format of the date. To find if it is the last day of the week, Sunday corresponds to 7 so we can do an equality check.
iso = df.date.dt.isocalendar()
mask = iso.day == 7
df.loc[mask].assign(date=iso.year.astype(str) + iso.week.astype(str).str.rjust(2, "0"))
date value
1 202052 15.50
4 202201 36.15

how to change the month name column to month number column in dataframe in pandas

I have one column named Month number as Jan, Feb , Mar....
and I want to change the the whole column to number like: 01,02,03...
How to realize in python, when I use strptime, got the error like "strptime() argument 1 must be str, not Series "
Assuming the following df:
from time import strptime
df = pd.DataFrame({'month':['Jan','Feb','Mar']})
month
0 Jan
1 Feb
2 Mar
You can use strptime(x, '%b') to convert your abbreviated month name to numbers.
df['month_number'] = [strptime(str(x), '%b').tm_mon for x in df['month']]
Result:
month month_number
0 Jan 1
1 Feb 2
2 Mar 3

Extracting month from timestamp by specifying the date format of the timestamp in Python

I have a data set with a timestamp in format dd/mm/yyyy hh:mm:ss. I would like to extract the month and the year for the whole column. So I used the following code:
Extracting the year
`df['Year'] = pd.DatetimeIndex(df['timestamp']).year`
Extracting the month
`df['month_num'] = pd.DatetimeIndex(df['timestamp']).month`
Converting number of month in name of month
`df['Month'] = df['month_num'].apply(lambda x: calendar.month_abbr[x])`
`df.drop(['month_num'], axis=1, inplace=True)`
However, the above returns the wrong month as sometimes it takes the month from the second pair of details (as if date format were in dd/mm/yyyy, which in fact it is), and sometimes it takes the month from the first pair of details (as if date format were in mm/dd/yyyy, which is not). So as you can see below, it returns 'Feb' for what should be 'Jan', although 'Dec' is correct.
`02/01/2020 12:07:00 EURUSD EUR 138,476.70 2020 Feb`
`02/01/2020 12:02:12 GBPHKD GBP 13,545.93 2020 Feb`
`31/12/2019 16:35:48 GBPUSD USD 537.60 2019 Dec`
`31/12/2019 16:29:34 GBPHKD HKD 279.17 2019 Dec`
I also tried to change the original timestamp format to yyyy-mm-dd but when changing the format it keep taking the month with a different order.
Any idea for this? Cheers!
use strftime('%b') and assign
ensure your datecolumn is a proper date pd.to_datetime(df['date'])
df.assign(year = df[0].dt.year,
month = df[0].dt.strftime('%b'))
print(df)
0 1 2 3 4 year month
0 2020-02-01 12:07:00 EURUSD EUR 138,476.70 2020 Feb
1 2020-02-01 12:02:12 GBPHKD GBP 13,545.93 2020 Feb
2 2019-12-31 16:35:48 GBPUSD USD 537.60 2019 Dec
3 2019-12-31 16:29:34 GBPHKD HKD 279.17 2019 Dec

Combine month name and year in a column pandas python

df
Year Month Name Avg
2015 Jan 12
2015 Feb 13.4
2015 Mar 10
...................
2019 Jan 11
2019 Feb 11
Code
df['Month Name-Year']= pd.to_datetime(df['Month Name'].astype(str)+df['Year'].astype(str),format='%b%Y')
In the dataframe, df, the groupby output avg is on keys month name and year. So month name and year are actually multilevel indices. I want to create a third column Month Name Year so that I can do some operation (create plots etc) using the data.
The output I am getting using the code is as below:
Year Month Name Avg Month Name-Year
2015 Jan 12 2015-01-01
2015 Feb 13.4 2015-02-01
2015 Mar 10 2015-03-01
...................
2019 Nov 11 2019-11-01
2019 Dec 11 2019-12-01
and so on.
The output I want is 2015-Jan, 2015-Feb etc in Month Name-Year column...or I want 2015-01, 2015-02...2019-11, 2019-12 etc (only year and month, no days).
Please help
One type of solution is converting to datetimes and then change format by Series.dt.to_period or Series.dt.strftime:
df['Month Name-Year']=pd.to_datetime(df['Month Name']+df['Year'].astype(str),format='%b%Y')
#for months periods
df['Month Name-Year1'] = df['Month Name-Year'].dt.to_period('m')
#for 2010-02 format
df['Month Name-Year2'] = df['Month Name-Year'].dt.strftime('%Y-%m')
Simpliest is solution without convert to datetimes only join with - and convert years to strings:
#format 2010-Feb
df['Month Name-Year3'] = df['Year'].astype(str) + '-' + df['Month Name']
...what is same like converting to datetimes and then converting to custom strings:
#format 2010-Feb
df['Month Name-Year31'] = df['Month Name-Year'].dt.strftime('%Y-%b')

Categories

Resources