I have a dataframe with bunch of dates in them, I would like to check each entry if it is a weekday or weekend, if it is a weekend i would like to increase the date to the next weekday. What is the most pythonic way of doing this ? I was thinking about using a list comprehension and s.th like
days = pd.date_range(start='1/1/2020', end='1/08/2020')
dates = pd.DataFrame(days,columns=['dates'])
dates['dates'] = [day+pd.DateOffset(days=1) if day.weekday() >4 else day for day in dates['dates']]
How could I adjust the code to cover day+pd.DateOffset(days=1) (for Sunday) and day+pd.DateOffset(days=2) (for Saturday) and get an updated column with the shifted dates? Running the code twice with +1 should work but is certainly not pretty
Related
Let's assume that I have a list of dates like:
["2022-01-02", "2022-01-03"]
and so on. I also have a list of dates that are considered as a "holiday" in the same format. Now, I want to check if the dates from the first list cover the whole business week (so Monday to Friday), while taking the holiday list to consideration. So, if for example, I get provided with a list of days from Tuesday to Friday and there's no holiday on Monday then I'll get a False. But, if there would be a holiday on Monday, then I'd get a True.
Would be perfect if it worked between any two given dates, so that it won't be week-specific.
Is there any nice way of doing such things in python? How would you do it? Thanks in advance
Its a little difficult for me to exactly understand what you want, but I'll give you a head start and then you can tweak the if/else statements to your desire.
The datetime module has many features that might be of interest to you, including isofdateformat and isoweekday
The following code checks whether a day is a weekday and whether any of the days are holidays:
from datetime import date
# List of dates to check
dates = ["2022-01-04", "2022-01-05", "2022-01-06"]
# List of holidays
holidays = ["2022-01-03", "2022-01-06"]
# Convert the dates in the list to date objects
date_objects = [date.fromisoformat(d) for d in dates]
covers_week = True
for d in date_objects:
if d.isoweekday() not in range(1, 6):
# If any of the dates are not weekdays, the list does not cover a whole business week
covers_week = False
# Check if there are any holidays in the list
for d in date_objects:
if d in holidays:
# If there are any holidays in the list, the list does not cover a whole business week
covers_week = False
# Print result
print(covers_week)
I have a time-series where I want to get the sum the business day values for each week. A snapshot of the dataframe (df) used is shown below. Note that 2017-06-01 is a Friday and hence the days missing represent the weekend
I use resample to group the data by each week, and my aim is to get the sum. When I apply this function however I get results which I can't justify. I was expecting in the first row to get 0 which is the sum of the values contained in the first week, then 15 for the next week etc...
df_resampled = df.resample('W', label='left').sum()
df_resampled.head()
Can someone explain to me what am I missing since it seems like I have not understood the resampling function correctly?
I have a pandas dataframe with datetime index (30 min frequency). And I want do remove "n" last days from it. My dataframe do not include weekends, so if the last day of it is Monday, I want to remove Monday, Friday and Thursday (from the end). So, I mean observed days, not calendar. What is the most pythonic way to do it?
Thanks.
Pandas knows about Monday to Friday as business days.
So if you want to remove the last n business days from your dataframe, you can just do:
df.drop(df[df.index >= df.index.max().date()-pd.offsets.BDay(n-1)].index, inplace=True)
If you really need to remove observed days in the dataframe, if will be slightly more complex because you will have to count the days. Code could be (using a companion dataframe called df_days):
# create a dataframe with same index and only one row per day:
df_days = pd.DataFrame(index=df.index).assign(day=df.index.date).drop_duplicates('day')
# now count the observed day in the companion dataframe
df_days['new_day'] = 1
df_days['days'] = df_days['new_day'].cumsum()
# compute first index to remove to remove last observed n days
ix = df_days.loc[df_days['days'] == df_days['days'].max() + 1 - n].index[0]
# ok drop the last observed n days from the initial dataframe and delete the companion one
df.drop(df.loc[df.index > ix].index)
del df_days
I can specify the date range of month ends using
import pandas as pd
monthend_range = pd.date_range(datetime.date(2017,12,10), datetime.date(2018,2,2), freq='BM')
Is there a straightforward way to include the middle of the month into the range above to form a middle-and-end-month index? Let's say that the logic we want is to use the successive month ends in the code above and find the business day that is right in the middle between the monthends. If that is not a business day, then try the following day and the following until we get a business day.
The expected output is
['2017-12-29', '2018-01-16', '2018-01-31']
This might seem a bit inconsistent as 2017-12-15 is a middle of the month that is within the date range. But the procedure is get the end of months, then interpolate between the ends. Unless of course there is a better approach to dealing with this question.
Idea is create business day range for each value with omit first and select value in the middle, last use Index.union for join togetehr:
a = []
for x in monthend_range[1:]:
r = pd.date_range(x.to_period('m').to_timestamp(), x, freq='B')
a.append(r[len(r)//2])
print (a)
[Timestamp('2018-01-16 00:00:00', freq='B')]
out = monthend_range.union(a)
print (out)
DatetimeIndex(['2017-12-29', '2018-01-16', '2018-01-31'], dtype='datetime64[ns]', freq=None)
I am trying to convert a dataframe column with a date and timestamp to a year-weeknumber format, i.e., 01-05-2017 03:44 = 2017-1. This is pretty easy, however, I am stuck at dates that are in a new year, yet their weeknumber is still the last week of the previous year. The same thing that happens here.
I did the following:
df['WEEK_NUMBER'] = df.date.dt.year.astype(str).str.cat(df.date.dt.week.astype(str), sep='-')
Where df['date'] is a very large column with date and times, ranging over multiple years.
A date which gives a problem is for example:
Timestamp('2017-01-01 02:11:27')
The output for my code will be 2017-52, while it should be 2016-52. Since the data covers multiple years, and weeknumbers and their corresponding dates change every year, I cannot simply subtract a few days.
Does anybody have an idea of how to fix this? Thanks!
Replace df.date.dt.year by this:
(df.date.dt.year- ((df.date.dt.week>50) & (df.date.dt.month==1)))
Basically, it means that you will substract 1 to the year value if the week number is greater than 50 and the month is January.