I am trying to solve for how to get the values of year to date versus last year to date from a dataframe.
Dataframe:
ID start_date distance
1 2019-7-25 2
2 2019-7-26 2
3 2020-3-4 1
4 2020-3-4 1
5 2020-3-5 3
6 2020-3-6 3
There is data back to 2017 and more data will keep getting added so I would like the YTD and LYTD to be dynamic based upon the current year.
I know how to get the cumulative sum for each year and month but I am really struggling with how to calculate the YTD and LYTD.
year_month_distance_df = distance_kpi_df.groupby(["Start_Year","Start_Month"]).agg({"distance":"sum"}).reset_index()
The other code I tried:
cum_sum_distance_ytd =
distance_kpi_df[["start_date_local","distance"]]
cum_sum_distance_ytd = cum_sum_distance_ytd.set_index("start_date_local")
cum_sum_distance_ytd = cum_sum_distance_ytd.groupby(pd.Grouper(freq = "D")).sum()
When I try this logic and add Start_Day into the group by it obviously just sums all the data for that day.
Expected output:
Year to Date = 8
Last Year to Date = 4
You could split the date into its components and get the ytd for all years with
expanding = df.groupby([
df.start_date.month, df.start_date.day, df.start_date.year
]).distance.sum().unstack().cumsum()
Unstacking will fill with np.nan wherever any year does not have a value in the row's date... if that is a problem you can use the fill_value parameter
.unstack(fill_value=0).cumsum()
Related
I have looked for solutions but seem to find none that point me in the right direction, hopefully, someone on here can help. I have a stock price data set, with a frequency of Month Start. I am trying to get an output where the calendar years are the column names, and the day and month will be the index (there will only be 12 rows since it is monthly data). The rows will be filled with the stock prices corresponding to the year and month. I, unfortunately, have no code since I have looked at for loops, groupby, etc but can't seem to figure this one out.
You might want to split the date into month and year and to apply a pivot:
s = pd.to_datetime(df.index)
out = (df
.assign(year=s.year, month=s.month)
.pivot_table(index='month', columns='year', values='Close', fill_value=0)
)
output:
year 2003 2004
month
1 0 2
2 0 3
3 0 4
12 1 0
Used input:
df = pd.DataFrame({'Close': [1,2,3,4]},
index=['2003-12-01', '2004-01-01', '2004-02-01', '2004-03-01'])
You need multiple steps to do that.
First split your column into the right format.
Then convert this column into two separate columns.
Then pivot the table accordingly.
import pandas as pd
# Test Dataframe
df = pd.DataFrame({'Date': ['2003-12-01', '2004-01-01', '2004-02-01', '2004-12-01'],
'Close': [6.661, 7.053, 6.625, 8.999]})
# Split datestring into list of form [year, month-day]
df = df.assign(Date=df.Date.str.split(pat='-', n=1))
# Separate date-list column into two columns
df = pd.DataFrame(df.Date.to_list(), columns=['Year', 'Date'], index=df.index).join(df.Close)
# Pivot the table
df = df.pivot(columns='Year', index='Date')
df
Output:
Close
Year 2003 2004
Date
01-01 NaN 7.053
02-01 NaN 6.625
12-01 6.661 8.999
This question already has answers here:
Issues in getting week numbers with weeks starting on Sunday in Python?
(2 answers)
Closed 8 months ago.
I have a column "date" with values in MM/DD/YYYY format. How can I create a new column called weeks, with the week number, where week is starting from sunday and not monday.
For example:
I want it like
I would need more data, but I believe you could do something like this
df['Sale_Date'] = pd.to_datetime(df['Sale_Date'], infer_datetime_format=True)
df['Week'] = df['Sale_Date'] + datetime.timedelta(days = 1)
df['Week'] = df['Week'].dt.isocalendar().week
This will push all the dates forward one day and then get the week. So essentially when it reads the date it will read it as Monday for a Sundays date giving you the expected week. However, I recieved a different week number than you so I'm not sure if you are using a different week function that I am.
df['Sale_Date'] = pd.to_datetime(df.Sale_Date)
df['Week'] = df.Sale_Date.dt.strftime('%U')
df
ID sale_amt Sale_Date Week
0 1 100 2022-06-10 23
1 2 200 2022-06-05 23
2 3 250 2022-06-04 22
I have a dataframe which looks like this
In []: df.head()
Out [] :
DATE NAME AMOUNT CURRENCY
2018-07-27 John 100 USD
2018-06-25 Jane 150 GBP
...
The contents under the DATE column are of date type.
I want to aggregate all the data to be able to see to understand the days of the month and the count of the number of transactions that happened on that date.
I also wanted to group it by year as well as day.
The end result I wanted would have looked something like this
YEAR DAY COUNT
2018 1 0
2 1
3 0
4 0
5 3
6 4
and so on
I used the following code but the numbers are all wrong. Please help
In []: df = pd.DataFrame({'DATE':pd.date_range(start=dt.datetime(2018,7,27),end=dt.datetime(2020,7,21))})
df.groupby([df['DATE'].dt.year, df['DATE'].dt.day]).agg({'count'})
I have a table in excel that the column header is the month and the rows are the days. I need to get today's current date which i have already done. Once i do this i need to match the month and day with the column "cy_day".
Example:
if todays day is jan 3 then it should only return "2".
Excel File:
cy_day jan feb mar
1 1 1 1
2 3 2 4
3 4 4 5
4 7 5 6
import pandas as pd
from pandas import DataFrame
import calendar
cycle_day_path = 'test\\Documents\\cycle_day_calendar.xlsx'
df = pd.read_excel(cycle_day_path)
df = DataFrame(df, index=None)
print(df)
month = pd.to_datetime('today').strftime("%b")
day = pd.to_datetime('today').strftime("%d")
Try this:
today = pd.Timestamp('2019-01-03')
col = today.strftime('%b').lower()
df[df[col] == today.day]
Given you've extracted month using '%b', it should just be this after correcting for the upper case in '%b' month name (see http://strftime.org/):
df.loc[df[month.lower()] == day, 'cy_day']
Now for Jan 3 you will get 2 (as a DataFrame). If you want just the number 2 do:
df.loc[df[month.lower()] == day, 'cy_day'].values[0]
the value of the month variable as returned by pd.to_datetime('today').strftime("%b") is a capitalized string, so in order to use is to access column from yo your dataframe should lowercase it.
so first you should do
month = month.lower()
after date you need to make sure that the values in your month columns are of type str since you are going to compare them with an str value.
day_of_month = df[month] == day
df["cy_day"][day_of_month]
if they are not of type str, you should convert the day variable to the same type as the month columns
I am trying to fetch previous week same day data and then take an average of the value ("current_demand") for today's forecast (predict).
for example:
Today is Monday, so then I want to fetch data from the last two weeks Monday's data same time or block and then take an average of the value ["current_demand"] to predict today's value.
Input Data:
current_demand Date Blockno weekday
18839 01-06-2018 1 4
18836 01-06-2018 2 4
12256 02-06-2018 1 5
12266 02-06-2018 2 5
17957 08-06-2018 1 4
17986 08-06-2018 2 4
18491 09-06-2018 1 5
18272 09-06-2018 2 5
Expecting result:
18398 15-06-2018 1 4
something like that. I want to take same value, same block and same day of the previous two-week value then calculate for next value average.
I have tried some thing:
def forecast(DATA):
df = DATA
day = {0:'Monday',1:'Tuesday',2:'Wednesday',3:'Thursday',4:'Friday',5:'Saturday',6:'Sunday'}
df.friday = day - timedelta(days=day.weekday() + 3)
print df
forecast(DATA)
Please suggest me something. Thank you in advance
I like relativedelta for this kind of job
from dateutil.relativedelta import relativedelta
(datetime.datetime.today() + relativedelta(weeks=-2)).date()
Output:
datetime.date(2018, 7, 23)
without the actual structure of your df it's hard to provide a solution tailored to your needs