I have a minute dataframe that I grouped by day
day_grouped_df=minute_df.groupby(pd.Grouper(freq='D'))
Now I want to loop through each group and find the previous groups date inside the loop
for date, group_row in day_grouped_df:
#here I want to get the last groups date
Here how can we fetch the previous group's date when looping through grouped rows? Is there any way to get the index like normal looping so that we can do (index-1)
Related
I have one data frame with start_date and end_date (01-02-2020), based on these two dates it can be daily (if start and end are one day apart), similarly for yearly or quarterly.
Then there is a column Value (3.5) in values column.
Now if there exists one record for monthly with 2.5 value and one record with quarterly with 4.5 and multiple for daily like 1.5 and one for yearly like 0.5.
enter image description here
Then I need to get one row for one date like (01-01-2020) with summing values and giving aggregate value (2.5+4.5+1.5+0.5 = 9), hence 9 is total_value on 01-01-2020. Something like below:
enter image description here
There are years of data like this with multiple records existing for same time period. And I need to get aggregated value for one by one dates for all distinct 'names'
I have been trying to do this in Python with no success till now. Any help is appreciated.
The picture is what my dataframe looks like. I have user_name, movie_name and time column. I want to extract only rows that are first day of certain movie. For example, if movie a's first date in the time column is 2018-06-27, i want all the rows in that date and if movie b's first date in the time column is 2018-06-12, i only want those rows. How would i do that with pandas?
I assume that time column is of datetime type. If not, convert this
column calling pd.to_datetime.
Then run:
df.groupby('movie_name').apply(lambda grp:
grp[grp.time.dt.date == grp.time.min().date()])
Groupby groups the source DataFrame into grops concerning particular films.
Then grp.time.min().date() computes the minimal (first) date from the
current group.
And finally the whole lamda function returns only rows from this date
(also from the current group).
The same for other groups of rows (films).
I have a file with one row per EMID per Effective Date. I need to find the maximum Effective date per EMID that occurred before a specific date. For instance, if EMID =1 has 4 rows, one for 1/1/16, one for 10/1/16, one for 12/1/16, and one for 12/2/17, and I choose the date 1/1/17 as my specific date, I'd want to know that 12/1/16 is the maximum date for EMID=1 that occurred before 1/1/17.
I know how to find the maximum date overall by EMID (groupby.max()). I also can filter the file to just dates before 1/1/17 and find the max of the remaining rows. However, ultimately I need the last row before 1/1/17, and then all the rows following 1/1/17, so filtering out the rows that occur after the date isn't optimal, because then I have to do complicated joins to get them back in.
# Create dummy data
dummy = pd.DataFrame(columns=['EmID', 'EffectiveDate'])
dummy['EmID'] = [random.randint(1, 10000) for x in range(49999)]
dummy['EffectiveDate'] = [np.random.choice(pd.date_range(datetime.datetime(2016,1,1), datetime.datetime(2018,1,3))) for i in range(49999)]
#Create group by
g = dummy.groupby('EmID')['EffectiveDate']
# This doesn't work, but effectively shows what I'm trying to do
dummy['max_prestart'] = max(dt for dt in g if dt < datetime(2017,1,1))
I expect that output to be an additional column in my dataframe that has the maximum date that occurred before the specified date.
Using map after selected .
s=dummy.loc[dummy.EffectiveDate>'2017-01-01'].groupby('EmID').EffectiveDate.max()
dummy['new']=dummy.EmID.map(s)
Here Using transform and assuming else dt
dummy['new']=dummy.loc[dummy.EffectiveDate>'2017-01-01'].groupby('EmID').EffectiveDate.transform('max')
dummy['new']=dummy['new'].fillna(dummy.EffectiveDate)
I have 4 columns which have Date , Account #, Quantity and Sale respectively. I have daily data but I want to be able to show Weekly Sales per Customer and the Quantity.
I have been able to group the column by week, but I also want to group it by OracleNumber, and Sum the Quantity and Sales columns. How would I get that to work without messing up the Week format.
import pandas as pd
names = ['Date','OracleNumber','Quantity','Sale']
sales = pd.read_csv("CustomerSalesNVG.csv",names=names)
sales['Date'] = pd.to_datetime(sales['Date'])
grouped=sales.groupby(sales['Date'].map(lambda x:x.week))
print(grouped.head())
IIUC, you could groupby w.r.t the week column and OracleNumber column by providing an extra key to the list for which the Groupby object has to use and perform sum operation later:
sales.groupby([sales['Date'].dt.week, 'OracleNumber']).sum()
I'm trying to recreate a bit of a convoluted scenario, but I will do my best to explain it:
Create a pandas df1 with two columns: 'Date' and 'Price' - done
I add two new columns: 'rollmax' and 'rollmin', where the 'rollmax' is an 8 days rolling maximum and 'rollmin' is a
rolling minimum. - done
Now I need to create another column 'rollmax_date' that would get
populated through a look up rule:
for the row n, go to the column 'Price' and parse through the values
for the last 8 days and find the maximum, then get the value of the
corresponding column 'Price' and put this value in the column 'rollingmax_date'.
the same logic for the 'rollingmin_date', but instead of rolling maximum date, we look for the rolling minimum date.
Now I need to find the previous 8 days max and min for the same rolling window of 8 days that I have already found.
I did the first two and tried the third one, but I'm getting wrong results.
The code below gives me only dates where on the same row df["Price"] is the same as df['rollmax'], but it doesn't bring all the corresponding dates from 'Date' to 'rollmax_date'
df['rollmax_date'] = df.loc[(df["Price"] == df.rollmax), 'Date']
This is an image with steps for recreating the lookup