Is there a better solution thant dt.weekofyear? [duplicate]

Is there a better solution thant dt.weekofyear? [duplicate] - python

This question already has answers here:
How to get year and week number aligned for a date
(3 answers)
Closed 2 years ago.
Is there a better solution than df['weekofyear'] = df['date'].dt.weekofyear?
The problem of this solution is that, sometimes, the days after the last week of the year n but before the first week of the year n+1 are counted as week 1 and and not as week 0.
I am working with pyspark and koalas (no pandas allowed).
Here is an example:
As you can see, the first column is Date, the second one is week, the third is month and last is year.

Not sure if this is what you want...? I suppose you can use case when to replace the undesired values of week of year.
df['weekofyear'] = df['date'].dt.weekofyear
df2 = ks.sql("""
select
date,
case when weekofyear = 1 and month = 12 then 53 else weekofyear end as weekofyear,
month,
year
from {df}""")

Related

Sort a dataframe by month and date, but excluding the year

I am working on a dataFrame with multiple years of data with a timestamp for each value. I am struggling with sorting data for summer/non-summer months. I am not sure how to tell pandas to get the data with dates June 1 to September 30, however discarding the year. I created a psuedo-code of what I want to achieve below, but obviously a timestamp would not work for this case. Thank you for the help, and I apologize for my lack of clarity.
Code is below:
Code is also above as an image:
# My goal is to get summer months June 1 to September 30
# I have a multiple years I only want extract the summer months from each year
summer_start = pd.Timestamp(month=6, day=1) # I recognize this will not work without a year. This is pseudo-code
summer_end = pd.Timestamp(month=9, day=30)
df['Is_Summer'] = df['Date'].apply(lambda x: 'True' if x >= summer_start && x <= summer_end else 'False')

For your case, it is simply month between 6 and 9 inclusive, so you can do
df['is_summer'] = df['Date'].dt.month.between(6,9)

Remove last n days from dataframe

I have a pandas dataframe with datetime index (30 min frequency). And I want do remove "n" last days from it. My dataframe do not include weekends, so if the last day of it is Monday, I want to remove Monday, Friday and Thursday (from the end). So, I mean observed days, not calendar. What is the most pythonic way to do it?
Thanks.

Pandas knows about Monday to Friday as business days.
So if you want to remove the last n business days from your dataframe, you can just do:
df.drop(df[df.index >= df.index.max().date()-pd.offsets.BDay(n-1)].index, inplace=True)
If you really need to remove observed days in the dataframe, if will be slightly more complex because you will have to count the days. Code could be (using a companion dataframe called df_days):
# create a dataframe with same index and only one row per day:
df_days = pd.DataFrame(index=df.index).assign(day=df.index.date).drop_duplicates('day')
# now count the observed day in the companion dataframe
df_days['new_day'] = 1
df_days['days'] = df_days['new_day'].cumsum()
# compute first index to remove to remove last observed n days
ix = df_days.loc[df_days['days'] == df_days['days'].max() + 1 - n].index[0]
# ok drop the last observed n days from the initial dataframe and delete the companion one
df.drop(df.loc[df.index > ix].index)
del df_days

Calculating week number of year (dealing with first week of year)

I am trying to convert the date into week number of year.
In my case, I cannot include the day of one year into another year.
The isocalendar() works fine to find week number, however, it assumes some rules: if the first week in January has less than 4 days, it is counted as the last week of the year before.
So, this function returns:
date(2016, 1, 1).isocalendar()[1]
53
Is there some way, using this function, to change this, to return week 0 instead of week 53 (from previous year) ?

how about this?
import datetime
datetime.date(2016, 1, 1).strftime("%U")

Pandas time series decomposition based on leap year [duplicate]

This question already has answers here:
Subtract a year from a datetime column in pandas
(4 answers)
Closed 4 years ago.
I have a pandas Time Series (called df) that has one column (with name data) that contains data with a daily frequency over a time period of 5 years. The following code produces some random data:
import pandas as pd
import numpy as np
df_index = pd.date_range('01-01-2012', periods=5 * 365 + 2, freq='D')
df = pd.DataFrame({'data': np.random.rand(len(df_index))}, index=df_index)
I want to perform a simple yearly trend decomposition, where for each day I subtract its value one year ago. Aditionally, I want to attend leap years in the subtraction. Is there any elegant way to do that? My way to do this is to perform differences with 365 and 366 days and assign them to new columns.
df['diff_365'] = df['data'].diff(365)
df['diff_366'] = df['data'].diff(366)
Afterwards, I apply a function to each row thats selects the right value based on whether the same date from last year is 365 or 366 days ago.
def decide(row):
if (row.name - 59).is_leap_year:
return row[1]
else:
return row[0]
df['yearly_diff'] = df[['diff_365', 'diff_366']].apply(decide, axis=1)
Explanation: the function decide takes as argument a row from the DataFrame consisting of the columns diff_365 and diff_366 (along with the DatetimeIndex). The expression row.name returns the date of the row and assuming the time series has daily frequency (freq = 'D'), 59 days are subtracted which is the number of days from 1st January to 28th February. Based on whether the resulting date is a day from a leap year, the value from the diff_366 column is returned, otherwise the value from the diff_365 column.
This took 8 lines and it feels that the subtraction can be performed in one or two lines. I tried to apply a similiar function directly to the data column (via apply and taking the default argument axis=0). But in this case, I cannot take my DatetimeIndex into account. Is there a better to perform the subtraction?

You may not need to worry about dealing with leap years explicitly. When you construct a DatetimeIndex, you can specify start and end parameters. As per the docs:
Of the four parameters start, end, periods, and freq, exactly three
must be specified.
Here's an example of how you can restructure your logic:
df_index = pd.date_range(start='01-01-2012', end='12-31-2016', freq='D')
df = pd.DataFrame({'data': np.random.rand(len(df_index))}, index=df_index)
df['yearly_diff'] = df['data'] - (df_index - pd.DateOffset(years=1)).map(df['data'].get)
Explanation
We construct a DatetimeIndex object by supplying start, end and freq arguments.
Subtract 1 year from your index by subtracting pd.DateOffset(years=1).
Use pd.Series.map to map these 1yr behind dates to data.
Subtract the resulting series from the original data series.

how to find current day is weekday or weekends in Python? [duplicate]

This question already has answers here:
How do I get the day of week given a date?
(30 answers)
Closed 7 years ago.
Please suggest me on the following.
How to find whether a particular day is weekday or weekend in Python?

You can use the .weekday() method of a datetime.date object
import datetime
weekno = datetime.datetime.today().weekday()
if weekno < 5:
print "Weekday"
else: # 5 Sat, 6 Sun
print "Weekend"

Use the date.weekday() method. Digits 0-6 represent the consecutive days of the week, starting from Monday.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a better solution thant dt.weekofyear? [duplicate] - python

Related

Sort a dataframe by month and date, but excluding the year

Remove last n days from dataframe

Calculating week number of year (dealing with first week of year)

Pandas time series decomposition based on leap year [duplicate]

how to find current day is weekday or weekends in Python? [duplicate]

Categories

Resources