I have a table in excel that the column header is the month and the rows are the days. I need to get today's current date which i have already done. Once i do this i need to match the month and day with the column "cy_day".
Example:
if todays day is jan 3 then it should only return "2".
Excel File:
cy_day jan feb mar
1 1 1 1
2 3 2 4
3 4 4 5
4 7 5 6
import pandas as pd
from pandas import DataFrame
import calendar
cycle_day_path = 'test\\Documents\\cycle_day_calendar.xlsx'
df = pd.read_excel(cycle_day_path)
df = DataFrame(df, index=None)
print(df)
month = pd.to_datetime('today').strftime("%b")
day = pd.to_datetime('today').strftime("%d")
Try this:
today = pd.Timestamp('2019-01-03')
col = today.strftime('%b').lower()
df[df[col] == today.day]
Given you've extracted month using '%b', it should just be this after correcting for the upper case in '%b' month name (see http://strftime.org/):
df.loc[df[month.lower()] == day, 'cy_day']
Now for Jan 3 you will get 2 (as a DataFrame). If you want just the number 2 do:
df.loc[df[month.lower()] == day, 'cy_day'].values[0]
the value of the month variable as returned by pd.to_datetime('today').strftime("%b") is a capitalized string, so in order to use is to access column from yo your dataframe should lowercase it.
so first you should do
month = month.lower()
after date you need to make sure that the values in your month columns are of type str since you are going to compare them with an str value.
day_of_month = df[month] == day
df["cy_day"][day_of_month]
if they are not of type str, you should convert the day variable to the same type as the month columns
Related
quarter = pd.Timestamp(dt.date(2020, 1, 1)).quarter
assert quarter == 1
df['quarter'] = df['date'].dt.quarter
This returns a 1,2,3 or 4 in df['quarter'] depending on the date in column df['date'].
What I would like to have is this format in column df['quarter']:
Qx-2019 or Qx-2020 depending on the year, where x is the quarter found with the script above.
How can I get the specific year from the quarter and add the formt Qx-year?
Thank you.
Try with to_period
s.dt.to_period('Q')
Out[159]:
0 2020Q4
1 2019Q1
dtype: period[Q-DEC]
Update
'Q' + df['date'].dt.quarter.astype(str) + '-' + df['date'].dt.year.astype(str)
I am trying to solve for how to get the values of year to date versus last year to date from a dataframe.
Dataframe:
ID start_date distance
1 2019-7-25 2
2 2019-7-26 2
3 2020-3-4 1
4 2020-3-4 1
5 2020-3-5 3
6 2020-3-6 3
There is data back to 2017 and more data will keep getting added so I would like the YTD and LYTD to be dynamic based upon the current year.
I know how to get the cumulative sum for each year and month but I am really struggling with how to calculate the YTD and LYTD.
year_month_distance_df = distance_kpi_df.groupby(["Start_Year","Start_Month"]).agg({"distance":"sum"}).reset_index()
The other code I tried:
cum_sum_distance_ytd =
distance_kpi_df[["start_date_local","distance"]]
cum_sum_distance_ytd = cum_sum_distance_ytd.set_index("start_date_local")
cum_sum_distance_ytd = cum_sum_distance_ytd.groupby(pd.Grouper(freq = "D")).sum()
When I try this logic and add Start_Day into the group by it obviously just sums all the data for that day.
Expected output:
Year to Date = 8
Last Year to Date = 4
You could split the date into its components and get the ytd for all years with
expanding = df.groupby([
df.start_date.month, df.start_date.day, df.start_date.year
]).distance.sum().unstack().cumsum()
Unstacking will fill with np.nan wherever any year does not have a value in the row's date... if that is a problem you can use the fill_value parameter
.unstack(fill_value=0).cumsum()
**Year_qtr GDP ADJ_GDP**
2 1947q1 243.1 1934.5
3 1947q2 246.3 1932.3
4 1948q3 250.1 1930.3
5 1949q4 260.3 1960.7
Tried parse() from dateutil package but didnt wwork.
Result dataframe should have 'Year_qtr' column as date values instead of object.
pandas already can do this out of the box! you can cast to datetime right away:
import pandas as pd
df = pd.DataFrame({'Year_qtr': ['1947q1', '1947q2', '1948q3', '1949q4']})
df['datetime'] = pd.to_datetime(df['Year_qtr'])
# df
# Year_qtr datetime
# 0 1947q1 1947-01-01
# 1 1947q2 1947-04-01
# 2 1948q3 1948-07-01
# 3 1949q4 1949-10-01
# vice versa you can do
df['datetime'].dt.to_period("Q")
# 0 1947Q1
# 1 1947Q2
# 2 1948Q3
# 3 1949Q4
# Name: datetime, dtype: period[Q-DEC]
You can't store quarter in a datetime object. You can have them separately:
# Split year and quarter information
year, quarter = map(int, year_column.split('q'))
Essentially, I want to apply some lambda function(?) of some sort to apply to a column in my dataframe that contains dates. Originally, I used dt.week to extract the week number but the calendar dates don't match up with the fiscal year I'm using (Apr 2019 - Mar 2020).
I have tried using pandas' function to_period('Q-MAR) but that seems to be a little bit off. I have been researching other ways but nothing seems to work properly.
Apr 1 2019 -> Week 1
Apr 3 2019 -> Week 1
Apr 30 2019 -> Week 5
May 1 2019 -> Week 5
May 15 2019 -> Week 6
Thank you for any advice or tips in advance!
You can create a DataFrame which contains the dates with a frequency of weeks:
date_rng = pd.date_range(start='01/04/2019',end='31/03/2020', freq='W')
df = pd.DataFrame(date_rng, columns=['date'])
You can then query df for which index the date is smaller than or equal to the value:
df.index[df.date <= query_date][-1]
This will output the largest index which is smaller than or equal to the date you want to examine. I imagine you can pour this into a lambda yourself?
NOTE
This solution has limitations, the biggest one being you have to manually define the datetime dataframe.
I did create a fiscal calendar that can be later improvised to create function in spark
from fiscalyear import *
beginDate = '2016-01-01'
endDate = '2021-12-31'
#create empty dataframe
df = spark.createDataFrame([()])
#create date from given date range
df1 = df.withColumn("date",explode(expr(f"sequence(to_date('{beginDate}'), to_date('{endDate}'), interval 1 day)")))
# get week
df1 = df1.withColumn('week',weekofyear(col("date"))).withColumn('year',year(col("date")))
#translate to use pandas in python
df1 = df1.toPandas()
#get fiscal year
df1['financial_year'] = df1['date'].map(lambda x: x.year if x.month > 3 else x.year-1)
df1['date'] = pd.to_datetime(df1['date'])
#get calendar qtr
df1['quarter_old'] = df1['date'].dt.quarter
#get fiscal calendar
df1['quarter'] = np.where(df1['financial_year']< (df1['year']),df1['quarter_old']+3,df1['quarter_old'])
df1['quarter'] = np.where(df1['financial_year'] == (df1['year']),df1['quarter_old']-1,df1['quarter'])
#get fiscal week by shiftin gas per number of months different from usual calendar
df1["fiscal_week"] = df1.week.shift(91)
df1 = df1.loc[(df1['date'] >= '2020-01-01')]
df1.display()
I have the following pandas data frame:
Shortcut_Dimension_4_Code Stage_Code
10225003 2
8225003 1
8225004 3
8225005 4
It is part of a much larger dataset that I need to be able to filter by month and year. I need to pull the fiscal year from the first two digits for values larger than 9999999 in the Shortcut_Dimension_4_Code column, and the first digit for values less than or equal to 9999999. That value needs to be added to "20" to produce a year i.e. "20" + "8" = 2008 | "20" + "10" = 2010.
That year "2008, 2010" needs to be combined with the stage code value (1-12) to produce a month/year, i.e. 02/2010.
The date 02/2010 then needs to converted from fiscal year date to calendar year date, i.e. Fiscal Year Date : 02/2010 = Calendar Year date: 08/2009. The resulting date needs to be presented in a new column. The resulting df would end up looking like this:
Shortcut_Dimension_4_Code Stage_Code Date
10225003 2 08/2009
8225003 1 07/2007
8225004 3 09/2007
8225005 4 10/2007
I am new to pandas and python and could use some help. I am beginning with this:
Shortcut_Dimension_4_Code Stage_Code CY_Month Fiscal_Year
0 10225003 2 8.0 10
1 8225003 1 7.0 82
2 8225003 1 7.0 82
3 8225003 1 7.0 82
4 8225003 1 7.0 82
I used .map and .str methods to produce this df, but have not been able to figure out how to get the FY's right, for fy 2008-2009.
In below code, I'll assume Shortcut_Dimension_4_Code is an integer. If it's a string you can convert it or slice it like this: df['Shortcut_Dimension_4_Code'].str[:-6]. More explanations in comments alongside the code.
That should work as long as you don't have to deal with empty values.
import pandas as pd
import numpy as np
from datetime import date
from dateutil.relativedelta import relativedelta
fiscal_month_offset = 6
input_df = pd.DataFrame(
[[10225003, 2],
[8225003, 1],
[8225004, 3],
[8225005, 4]],
columns=['Shortcut_Dimension_4_Code', 'Stage_Code'])
# make a copy of input dataframe to avoid modifying it
df = input_df.copy()
# numpy will help us with numeric operations on large collections
df['fiscal_year'] = 2000 + np.floor_divide(df['Shortcut_Dimension_4_Code'], 1000000)
# loop with `apply` to create `date` objects from available columns
# day is a required field in date, so we'll just use 1
df['fiscal_date'] = df.apply(lambda row: date(row['fiscal_year'], row['Stage_Code'], 1), axis=1)
df['calendar_date'] = df['fiscal_date'] - relativedelta(months=fiscal_month_offset)
# by default python dates will be saved as Object type in pandas. You can verify with `df.info()`
# to use clever things pandas can do with dates we need co convert it
df['calendar_date'] = pd.to_datetime(df['calendar_date'])
# I would just keep date as datetime type so I could access year and month
# but to create same representation as in question, let's format it as string
df['Date'] = df['calendar_date'].dt.strftime('%m/%Y')
# copy important columns into output dataframe
output_df = df[['Shortcut_Dimension_4_Code', 'Stage_Code', 'Date']].copy()
print(output_df)