I need a column for the df that will be used to group it by weeks.
The problem is all the reports in Tableau are build using the following format for week: 2019-01-01 it is like, using the first day of week repetitively Mon-Sun.
Data:
cw = pd.DataFrame({ "lead_date" : [2019-01-01 00:02:16, 2018-08-01 00:02:16 , 2017-07-07 00:02:16, 2015-12-01 00:02:16, 2016-09-01 00:02:16] ,
"name": ["aa","bb","cc", "dd", "EE"] )}
My code:
# extracting
cw["week"] = cw["lead_date"].apply(lambda df: df.strftime("%W") )
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
Output:
lead_date year month week
2019-01-01 00:02:16, 2019 , 01 , 00
-
-
-
etc..
Desired output:
having week as date format rather then just 00 or 01 etc..
lead_date year month week
2019-01-01 00:02:16, 2019 , 01 , 2019-01-01
2019-01-15 00:02:16, 2019 , 01 , 2019-01-14
2019-01-25 00:02:16, 2019 , 01 , 2019-01-21
2019-01-28 00:02:16, 2019 , 01 , 2019-01-21
You can do like this:
from datetime import datetime, timedelta
cw['lead_date'].apply(lambda r: datetime.strptime(r, '%Y-%m-%d') - timedelta(days=datetime.strptime(r, '%Y-%m-%d').weekday()))
This will set every date to starting day of that week.
You can do it as follows with using pandas.DatetimeIndex.dayofweek and pandas.Timedelta()
(Note that the first day of 2019.01.01. week is 2018.12.31.):
import pandas as pd
cw = pd.DataFrame({"lead_date" : pd.DatetimeIndex([
"2019-01-01 00:02:16", "2018-08-01 00:02:16" , "2017-07-07 00:02:16",
"2015-12-01 00:02:16", "2016-09-01 00:02:16"]),
"name": ["aa","bb","cc", "dd", "EE"]})
# extracting
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
cw["week"] = (cw["lead_date"] - ((cw["lead_date"].dt.dayofweek) *
pd.Timedelta(days=1)).values.astype('M8[D]'))
print(cw[["lead_date", "year", "month", "week"]])
Out:
lead_date year month week
0 2019-01-01 00:02:16 2019 01 2018-12-31
1 2018-08-01 00:02:16 2018 08 2018-07-30
2 2017-07-07 00:02:16 2017 07 2017-07-03
3 2015-12-01 00:02:16 2015 12 2015-11-30
4 2016-09-01 00:02:16 2016 09 2016-08-29
I think this gets you the output you want:
cw = pd.DataFrame({ "lead_date" : [pd.to_datetime('2019-01-01 00:02:16'), pd.to_datetime('2018-08-01 00:02:16') , pd.to_datetime('2017-07-07 00:02:16'), pd.to_datetime('2015-12-01 00:02:16'), pd.to_datetime('2016-09-01 00:02:16')] ,
"name": ["aa","bb","cc", "dd", "EE"] })
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["week"] = cw["lead_date"].apply(lambda df: df.strftime("%Y-%m-%d") )
cw.drop(columns='name', inplace=True)
output:
lead_date year month week
0 2019-01-01 00:02:16 2019 01 2019-01-01
1 2018-08-01 00:02:16 2018 08 2018-08-01
2 2017-07-07 00:02:16 2017 07 2017-07-07
3 2015-12-01 00:02:16 2015 12 2015-12-01
4 2016-09-01 00:02:16 2016 09 2016-09-01
Related
I have a dataframe df with Date column:
Date
--------
Wed 23 Dec
Sat 28 Nov
Thu 26 Nov
Sun 22 Nov
Tue 1 Dec
Wed 2 Dec
The Date column is object-type, I want to change the format using format="%m-%d-%Y" into yyyy-dd-mm
Expected output df:
Date
---------
2020-23-12
2020-28-11
2020-26-11
2020-22-11
2020-01-12
2020-02-12
Thanks in advance for the help!
Use to_datetime with format specified original data with added year, get column filled by datetimes:
df['Date'] = pd.to_datetime(df['Date']+'2020', format="%a %d %b%Y")
print (df)
Date
0 2020-12-23
1 2020-11-28
2 2020-11-26
3 2020-11-22
4 2020-12-01
5 2020-12-02
If need custom format add Series.dt.strftime, but datetimes are lost, get strings:
df['Date'] = pd.to_datetime(df['Date']+'2020', format="%a %d %b%Y").dt.strftime("%Y-%d-%m")
print (df)
Date
0 2020-23-12
1 2020-28-11
2 2020-26-11
3 2020-22-11
4 2020-01-12
5 2020-02-12
I have a pandas column like this :
yrmnt
--------
2015 03
2015 03
2013 08
2015 08
2014 09
2015 10
2016 02
2015 11
2015 11
2015 11
2017 02
How to fetch lowest year month combination :2013 08 and highest : 2017 02
And find the difference in months between these two, ie 40
You can connvert column to_datetime and then find indices by max and min values by idxmax and
idxmin:
a = pd.to_datetime(df['yrmnt'], format='%Y %m')
print (a)
0 2015-03-01
1 2015-03-01
2 2013-08-01
3 2015-08-01
4 2014-09-01
5 2015-10-01
6 2016-02-01
7 2015-11-01
8 2015-11-01
9 2015-11-01
10 2017-02-01
Name: yrmnt, dtype: datetime64[ns]
print (df.loc[a.idxmax(), 'yrmnt'])
2017 02
print (df.loc[a.idxmin(), 'yrmnt'])
2013 08
Difference in months:
b = a.dt.to_period('M')
d = b.max() - b.min()
print (d)
42
Another solution working only with month period created by Series.dt.to_period:
b = pd.to_datetime(df['yrmnt'], format='%Y %m').dt.to_period('M')
print (b)
0 2015-03
1 2015-03
2 2013-08
3 2015-08
4 2014-09
5 2015-10
6 2016-02
7 2015-11
8 2015-11
9 2015-11
10 2017-02
Name: yrmnt, dtype: object
Then convert to custom format by Period.strftime minimal and maximal values:
min_d = b.min().strftime('%Y %m')
print (min_d)
2013 08
max_d = b.max().strftime('%Y %m')
print (max_d)
2017 02
And subtract for difference:
d = b.max() - b.min()
print (d)
42
I am currently working on a dataset of 8 000 rows.
I want to split my date column by day, month, year. dtype for the date is object
How to convert the whole column of date by date. month, year?
A sample of the date of my dataset is shown below:
date
01-01-2016
01-01-2016
01-01-2016
01-01-2016
01-01-2016
df=pd.DataFrame(columns=['date'])
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
print(df)
dt=datetime.strptime('date',"%d-%m-%y")
print(dt)
This is the code I am using for date splitting but it is showing mean error
ValueError: time data 'date' does not match format '%d-%m-%y'
If you have pandas you can do this:
import pandas as pd
# Recreate your dataframe
df = pd.DataFrame(dict(date=['01-01-2016']*6))
df.date = pd.to_datetime(df.date)
# Create 3 new columns
df[['year','month','day']] = df.date.apply(lambda x: pd.Series(x.strftime("%Y,%m,%d").split(",")))
df
Returns
date year month day
0 2016-01-01 2016 01 01
1 2016-01-01 2016 01 01
2 2016-01-01 2016 01 01
3 2016-01-01 2016 01 01
4 2016-01-01 2016 01 01
5 2016-01-01 2016 01 01
Or without the formatting options:
df['year'],df['month'],df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day
df
Returns
date year month day
0 2016-01-01 2016 1 1
1 2016-01-01 2016 1 1
2 2016-01-01 2016 1 1
3 2016-01-01 2016 1 1
4 2016-01-01 2016 1 1
5 2016-01-01 2016 1 1
I found this but cant get the syntax correct.
time.asctime(time.strptime('2017 28 1', '%Y %W %w'))
I want to set a new column to show month in the format "201707" for July. It can be int64 or string doesnt have to be an actual readable date in the column.
My dataframe column ['Week'] is also in the format 201729 i.e. YYYYWW
dfAttrition_Billings_KPIs['Day_1'] = \
time.asctime(time.strptime(dfAttrition_Billings_KPIs['Week'].str[:4]
+ dfAttrition_Billings_KPIs['Month'].str[:-2] - 1 + 1', '%Y %W %w'))
So I want the output of the rows that have week 201729 to show in a new field month 201707. the output depends on what the row value is in 'Week'.
I have a million records so would like to avoid iterations of rows, lambdas and slow functions where possible :)
Use to_datetime with parameter format with add 1 for Mondays, last for format YYYYMM use strftime
df = pd.DataFrame({'date':[201729,201730,201735]})
df['date1']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w')
df['date2']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w').dt.strftime('%Y%m')
print (df)
date date1 date2
0 201729 2017-07-17 201707
1 201730 2017-07-24 201707
2 201735 2017-08-28 201708
If need convert from datetime to weeks custom format:
df = pd.DataFrame({'date':pd.date_range('2017-01-01', periods=10)})
df['date3'] = df['date'].dt.strftime('%Y %W %w')
print (df)
date date3
0 2017-01-01 2017 00 0
1 2017-01-02 2017 01 1
2 2017-01-03 2017 01 2
3 2017-01-04 2017 01 3
4 2017-01-05 2017 01 4
5 2017-01-06 2017 01 5
6 2017-01-07 2017 01 6
7 2017-01-08 2017 01 0
8 2017-01-09 2017 02 1
9 2017-01-10 2017 02 2
My dataframe has a column of dates like 2014-11-12.
I want to split it into two columns: Year and Month_date and put year as 2014 in 'Year column' and Nov 12 in 'Month_date' column. I have split Date column but not able to put in 'Nov 12' format. I am new to python. Any help will be highly appreciated.
I think you need:
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month-Date'] = df['Date'].dt.strftime('%m-%d')
print (df)
ID Date Data_Value Year Month-Date
0 USW00094889 2014-11-12 22 2014 11-12
1 USC00208972 2009-04-29 56 2009 04-29
2 USC00200032 2008-05-26 278 2008 05-26
3 USC00205563 2005-11-11 139 2005 11-11
4 USC00200230 2014-02-27 -106 2014 02-27
5 USW00014833 2010-10-01 194 2010 10-01
6 USC00207308 2010-06-29 144 2010 06-29
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month-Date'] = df['Date'].dt.strftime('%b-%d')
print (df)
ID Date Data_Value Year Month-Date
0 USW00094889 2014-11-12 22 2014 Nov-12
1 USC00208972 2009-04-29 56 2009 Apr-29
2 USC00200032 2008-05-26 278 2008 May-26
3 USC00205563 2005-11-11 139 2005 Nov-11
4 USC00200230 2014-02-27 -106 2014 Feb-27
5 USW00014833 2010-10-01 194 2010 Oct-01
6 USC00207308 2010-06-29 144 2010 Jun-29