I have monthly data from 1989/09 to 2020/12 and want to convert it to weekly data (friday to friday) starting 1989/09/29 to 2020/12/25 and just keep the values for the months in the weekly data (e.g. for all weeks in september the value for september for all weeks in october the value for october and so on).
That's how I did it:
df.set_index(pd.DatetimeIndex(df_k["Date"]), inplace=True) #set DatetimeIndex to resample
[![First: Monthly Data ][1]][1]
[1]: https://i.stack.imgur.com/9e5DW.png
df.resample("W-FRI",axis = 0).ffill() #upsample to get weekly data
[![Output: Weekly data with correct values but wrong timespan][2]][2]
[2]: https://i.stack.imgur.com/QAOMX.png
I get a dataframe with correct values but range 1989/09/01 to 2020/12/04. I want to adjust the range to
1989/09/29 to 2020/12/25 but cant find the correct input in the function.
Related
I have a few set of days where the index is based on 30min data from monday to friday. There might some missing dates (Might be because of holidays). But i would like to find the highest from column high and lowest from column low for ever past week. Like i am calculating today so previous week high and low is marked in the yellow of attached image.
Tried using rolling , resampling but some how not working. Can any one help
enter image description here
You really should add sample data to your question (by that I mean a piece of code/text that can easily be used to create a dataframe for illustrating how the proposed solution works).
Here's a suggestion. With df your dataframe, and column datatime with datetimes (and not strings):
df["week"] = (
df["datetime"].dt.isocalendar().year.astype(str)
+ df["datetime"].dt.isocalendar().week.astype(str)
)
mask = df["high"] == df.groupby("week")["high"].transform("max")
df = df.merge(
df[mask].rename(columns={"low": "high_low"})
.groupby("week").agg({"high_low": "min"}).shift(),
on="week", how="left"
).drop(columns="week")
Add a week column to df (year + week) for grouping along weeks.
Extract the rows with the weekly maximum highs by mask (there could be more than one for a week).
Build a corresponding dataframe with the weekly minimum of the lows corresponding to the weekly maximum highs (column named high_low), shift it once to get the value from the previous week, and .merge it to df.
If column datetime doesn't contain datetimes:
df["datetime"] = pd.to_datetime(df["datetime"])
If I have understood correctly, the solution should be
get the week number from the date
groupby the week number and fetch the max and min number.
groupby the week fetch max date to get max/last date for a week
now merge all the dataframes into one based on date key
Once the steps are done, you could do any formatting as required.
How do I resample monthly data to yearly data but starting from 1st October.
I tried the following as I know using base works for starting at a certain hour of a day but doesnt appear to work for month of the year.
df = (df.resample(rule='Y', base=10).sum().reset_index())
Here is how you do it:
offset = pd.DateOffset(months=9)
df.shift(freq=-offset).resample('YS').sum().shift(freq=offset)
Pandas has anchored offsets available for annual resamples starting at the first of a month.
The anchored offset for annual resampling starting in October is AS-OCT. Resampling and summing can be done like this:
df.resample("AS-OCT").sum()
I have a pandas dataframe with 3 columns:
OrderID_new (integer)
OrderTotal (float)
OrderDate_new (string or datetime sometimes)
Sales order ID's are in the first column, order values (totals) are in the 2nd column and order date - in mm/dd/yyyy format are in the last column.
I need to do 2 things:
to aggregate the order totals:
a) first into total sales per each day and then
b) into total sales per each calendar month
to convert values in OrderDate_new from mm/dd/yyyy format (e.g. 01/30/2015) into MM YYYY (e.g. January 2015) format.
The problem is some input files have 3rd column (date) already in datetime format while some have it as string format so that means sometimes string to datetime parsing will be needed while in other cases, reformatting datetime.
I have been trying to do 2 step aggregation with groupby but I'm getting some strange daily and monthly totals that make no sense.
What I need as the final stage is time series with 2 columns - 1. monthly sales and 2. month (Month Year)...
Then I will need to select and train some model for monthly sales time series forecast (out of scope for this question)...
What am I doing wrong?
How to do it effectively in Python?
dataframe example:
You did not provide usable sample data, hence I've synthesized.
resample() allows you to rollup a date column. Have provided daily and monthly
pd.to_datetime() gives you what you want
def mydf(size=10):
return pd.DataFrame({"OrderID_new":np.random.randint(100,200, size),
"OrderTotal":np.random.randint(200, 10000, size),
"OrderDate_new":np.random.choice(pd.date_range(dt.date(2019,8,1),dt.date(2020,1,1)),size)})
# smash orderdate to be a string for some rows
df = pd.concat([mydf(5), mydf(5).assign(OrderDate_new=lambda dfa: dfa.OrderDate_new.dt.strftime("%Y/%m/%d"))])
# make sure everything is a date..
df.OrderDate_new = pd.to_datetime(df.OrderDate_new)
# totals
df.resample("1d", on="OrderDate_new")["OrderTotal"].sum()
df.resample("1m", on="OrderDate_new")["OrderTotal"].sum()
My DataFrame has the following format:
I resampled the values based on a monthly basis, but the problem is that even the datatime index start from 2017-07-08, the Date Column after grouping by month and finding the mean, start from 2017-01-31. (There are not data at all in my DataFrame from January 2017 to August 2017). The data recording has started from August 2017.
Could you please give me some insights to understand what is happening?
I have a dataframe in Pandas with the date as index. "YYYY-MM-DD" format.
I have a lot of rows in this dataframe which means a lot of date indexes.
For all of these dates, most of them are daily continuous, some of them are weekly dates, some are yearly.
Example:
2015-01-05,
2015-01-06,
2015-01-07,
2015-01-08,
2015-01-09,
2015-01-16,
2015-01-23,
2015-01-30,
2015-02-28,
2015-03-30
So Some of them are daily dates and maybe then follow by several monthly dates or weekly dates, yearly dates.
So how can I know in which dates duration, it is daily, weekly, monthly and yearly?
Remark: the daily one only have working day dates (Monday - Friday).
For Weekly dates, the Friday dates will be displayed.
For Monthly/Quarterly/Yearly dates, the last day of this month/quarter/year will be displayed.