I have dataset as below:
I tried to calculate the day of the week using year, month and day, but got error as follows:
Can someone please help me out to code this?
You can use pandas.Series.dt.day_name.
import datetime
import pandas as pd
df = pd.DataFrame({'date': [datetime.datetime(2021,1,1),
datetime.datetime(2021,4,1),
datetime.datetime(2021,5,1),
datetime.datetime(2022,3,1),
datetime.datetime(2022,3,31)
]})
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
Method 1: using the datetime values.
df['date'].dt.day_name()
>>>
0 Friday
1 Thursday
2 Saturday
3 Tuesday
4 Thursday
Name: date, dtype: object
Method 2: using the year, month, and day values.
pd.to_datetime(df[['year', 'month', 'day']]).dt.day_name()
>>>
0 Friday
1 Thursday
2 Saturday
3 Tuesday
4 Thursday
dtype: object
I want to add a column called 'Date' which starts from todays date and adds business days as you go down the df up until a year. I am trying the below code but it repeats days as its adding a BD to Friday and Saturdays. The output should have row 1 = 2021-10-07 and end with 2022-10-08 with only BD being shown. Can anyone help please?
import datetime as dt
from pandas.tseries.offsets import BDay
from datetime import date
df = pd.DataFrame({'Date': pd.date_range(start=date.today(), end=date.today() + dt.timedelta(days=365))})
df['Date'] = df['Date'] + BDay(1)
It is unclear what your desired output is, but if you want a column 'Date' that only shows the dates for business days, you can use the code below.
import datetime as dt
import pandas as pd
from datetime import date
df = pd.DataFrame({'Date': pd.date_range(start=date.today(), end=date.today() + dt.timedelta(days=365))})
df = df[df.Date.dt.weekday < 5] # 0 is Monday, # 6 is Sunday
I am fetching data from one of the file which has date stored as
20 March
Using pandas I want to convert to 20/03/2020
I tried using strftime,to_datetime using errors but still I am not able convert.
Moreover when I group by date it stores date column numerically like:
1 January,1 February,1 March then 2 January,2 February, 2 March
How do I resolve this?
import pandas as pd
def to_datetime_(dt):
return pd.to_datetime(dt + " 2020")
to get timestamp in pandas with year 2020 always
If year is always 2020 then use the following code:
df = pd.DataFrame({'date':['20 March','22 March']})
df['date_new'] = pd.to_datetime(df['date'], format='%d %B')
If this shows year as 1900 then:
df['date_new'] = df['date_new'].mask(df['date_new'].dt.year == 1900, df['date_new'] + pd.offsets.DateOffset(year = 2020))
print(df)
date date_new
0 20 March 2020-03-20
1 22 March 2020-03-22
Further you can convert the date format as required.
Do,
import pandas as pd
import datetime
df = pd.DataFrame({
'dates': ['1 January', '2 January', '10 March', '1 April']
})
df['dates'] = df['dates'].map(lambda x: datetime.datetime.strptime(x, "%d %B").replace(year=2020))
# Output
dates
0 2020-01-01
1 2020-01-02
2 2020-03-10
3 2020-04-01
I have three columns in a pandas dataframe that I want to convert into a single date column. The problem is that one of the columns is day column. I am not able to convert into exact date of that month and year. Can anyone please help me to solve this issue. It looks something like this:
BirthMonth BirthYear Day
0 5 88 1st Monday
1 10 87 3rd Tuesday
2 12 87 2nd Saturday
3 1 88 1st Tuesday
4 2 88 1st Monday
Based on your reply to my first comment I updated my answer as follows. I think this is what you are looking for:
import re
import time
import calendar
import numpy as np
days = ['1st Monday', '3rd Tuesday', '4th wednesday']
months = [2, 3, 5]
years = [1990, 2000, 2019]
def extract_numeric(text: str):
return int(re.findall(r'\d+', text)[0])
def weekday_to_number(weekday: str):
return time.strptime(weekday, "%A").tm_wday
def get_date(number: int, weekday: int, month: int, year: int) -> str:
""" 3rd Tuesday translates to number: 3, weekday: 1 """
firstday, n_days = calendar.monthrange(year, month)
day_list = list(range(7)) * 6
month_days = day_list[firstday:][:n_days]
day = (np.where(np.array(month_days) == weekday)[0] + 1)[number - 1]
return '{}/{}/{}'.format(day, month, year)
numbers = []
weekdays = []
for day in days:
number, weekday = day.split()
numbers.append(extract_numeric(number))
weekdays.append(weekday_to_number(weekday))
dates = []
for number, weekday, month, year in zip(numbers, weekdays, months, years):
dates.append(get_date(number, weekday, month, year))
print(dates) # ['5/2/1990', '21/3/2000', '22/5/2019']
use the calendar module to get the day from days. then convert day,monyh,year to DateTime
import calendar
import datetime
def get_date(rows):
day = {'monday':0,'tuesday':1,'wednesday':2,'thursday':3,'friday':4,'saturday':5,'sunday':6}
day_num = day.get(rows.days.split()[1].lower())
weekday_num = [week[day_num] for week in calendar.monthcalendar(rows.years, rows.months) if week[day_num] >0][int(rows.days.split()[0][0])-1]
return datetime.date(rows.years, rows.months, weekday_num)
apply the above function to all rows
df['date'] = df(lambda row: get_date(row), axis=1)
df
>>
days months years date
0 1st Monday 8 2015 2015-08-03
1 3rd Tuesday 12 2017 2017-12-19
2 4th wednesday 5 2019 2019-05-22
Not very fast solution(since it involves 2 nested loops) but I hope this solves your question
import pandas as pd
import datetime
import calendar
pd.set_option('display.max_rows', 100)
cols = ['day', 'month', 'year']
data = [
['1st Monday', 8, 2015],
['3rd Tuesday', 12, 2017],
['4th Wednesday', 5, 2019]
]
df = pd.DataFrame(data=data, columns=cols)
df['week_number'] = df['day'].str.slice(0, 1)
df['week_number'] = df['week_number'].astype('int')
df['day_name'] = df['day'].str.slice(4)
def generate_dates(input_df, index_num):
_, days = calendar.monthrange(input_df.loc[index_num, 'year'], input_df.loc[index_num, 'month'])
df_dates = pd.DataFrame()
for i in range(1, days + 1):
df_dates.loc[i - 1, 'date'] = datetime.date(input_df.loc[index_num, 'year'], input_df.loc[index_num, 'month'],
i)
df_dates.loc[i - 1, 'year'] = input_df.loc[index_num, 'year']
df_dates.loc[i - 1, 'days'] = calendar.weekday(input_df.loc[index_num, 'year'],
input_df.loc[index_num, 'month'], i)
df_dates.loc[i - 1, 'day_name'] = df_dates.loc[i - 1, 'date'].strftime("%A")
df_dates['week_number'] = 1
df_dates['week_number'] = df_dates.groupby('day_name')['week_number'].cumsum()
return df_dates
dates = pd.DataFrame(columns=['date', 'year', 'days', 'day_name', 'week_number'])
for row in df.index:
dates = pd.concat([dates, generate_dates(df, row)])
df2 = df.merge(dates, on=['year', 'day_name', 'week_number'])
print(df2)
Edit to match SO new dataframe
My solution using pandas dayofweek function:
import numpy as np
import pandas as pd
from datetime import date
from dateutil.relativedelta import relativedelta
#generate dataframe
df=pd.DataFrame({'BirthMonth':[5, 10, 12, 1 ,2],
'BirthYear':[88, 87, 87, 88, 88],
'Day':['1st Monday', '3rd Tuesday', '2nd Saturday','1st Tuesday','1st Monday']})
#Assuming the year refers to 19xx
df.BirthYear=1900+df.BirthYear
#list of day names
weekday=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
#Identify day name in input df
days_ex=[s.split()[1].title() for s in df.Day]
#initialize output list
dateout= ["" for x in range(len(days_ex))]
for j in range(len(days_ex)):
#Identify the day number in the week (Monday is 1, Sunday is 7)
daynum=np.nonzero(np.char.rfind(weekday,days_ex[j])==0)[0][0]
#create start and end date for the month
date_start=date(df.BirthYear[j],df.BirthMonth[j],1)
date_end=date_start+relativedelta(months=+1)
#daily index range within month of interest
idx=pd.date_range(date_start,date_end,freq='d').dayofweek
# Find matching date based on input df
realday=np.where(idx==daynum)[0][int(df.Day[j][0])-1]+1
#output list
dateout[j]=str(realday)+'/'+str(df.BirthMonth[j])+'/'+str(df.BirthYear[j])
the result i got is:
['2/5/1988', '20/10/1987', '12/12/1987', '5/1/1988', '1/2/1988']
I know current year and current week,for example current year is 2018,current week is 8.
I want to know which year and week is it 10 weeks ago,10 weeks ago is the fiftieth week of 2017.
currentYear=2018
currentWeek=8
How to get it?
In [31]: from datetime import datetime as dt
In [32]: from datetime import timedelta
In [33]: current_date = dt(2018, 2, 20)
In [34]: current_date
Out[34]: datetime.datetime(2018, 2, 20, 0, 0)
In [35]: current_date.strftime('%V') <-- This is how we can get week of year.
Out[35]: '08'
In [36]: current_date - timedelta(weeks=10) <-- How to go back in time.
Out[36]: datetime.datetime(2017, 12, 12, 0, 0)
In [37]: ten_weeks_ago = _
In [38]: ten_weeks_ago.strftime('%V')
Out[38]: '50'
Best way,
Without knowing the date of 8th week of 2018,
Just creating date from week and year
here it is:
import datetime
d = "%s-W%s"%(currentYear,currentWeek)
r = datetime.datetime.strptime(d + '-0', "%Y-W%W-%w")
print(r-datetime.timedelta(weeks=10))
Output:
2017-12-17 00:00:00
Or if want in week format:
print((r-datetime.timedelta(weeks=10)).strftime('%V'))
Output:
50
after importing module import date
import time
from datetime import date
currentYear=datetime.strptime("2018-8-1", "%Y-%W-%w")
representing year and week and need a random weekday day added