Pandas totalling balances with date timeline from multiple sheets

Pandas totalling balances with date timeline from multiple sheets - python

I have three sheets inside one excel spreadsheet. I am trying to obtain the output listed below, or something close to it. The desired outcome is to know when there will be a shortage so that I can attempt to re-actively order and prevent the shortage. All of these, except for the output, is on one excel file. Each are different sheets. How hard will this be to achieve, is this possible? Note that in all sheets listed, there are tons of other data columns so positional references to columns may be needed, or using iloc to call upon columns by name.
instock sheet
product someother datapoint qty
5.25 1 2 100
5.25 1 3 200
6 2 1 50
6 4 1 500
ordered
product something ordernum qty date
5 1/4 abc 52521 50 07/01/2019
5 1/4 ddd 22911 100 07/28/2019
6 eeee 72944 10 07/5/2019
promised
product order qty date
5 1/4 456 300 06/12/2019
5 1/4 789 50 06/20/2019
5 1/4 112 50 07/20/2019
6 113 800 07/22/2019
5 1/4 144 50 07/28/2019
9 155 100 08/22/2019
Output
product date onhand qtyordered commited balance shortage
5.25 06/10 300 300 n
5.25 06/12 300 300 0 n
5.25 06/20 0 50 -50 y
5.25 07/01 -50 50 0 n
6 07/05 550 10 0 560 n
5.25 07/20 0 50 -50 y
6 07/22 560 0 800 -240 y
5.25 07/28 -50 100 50 0 n
9 08/22 0 0 100 -100 y

Related

Find overlapped rows in Pandas Data Frame

What is the easiest way to convert the following ascending data frame:
start end
0 100 500
1 400 700
2 450 580
3 750 910
4 920 940
5 1000 1200
6 1100 1300
into
start end
0 100 700
1 750 910
2 920 940
3 1000 1300
You may notice that rows 0:3 and 5:7 were merged, because these rows overlap or one row is subpart of another: actually, they have only one start and end.

Use a custom group with shift to identify the overlapping intervals and keep the first start and last end (or min/max if you prefer):
group = df['start'].gt(df['end'].shift()).cumsum()
out = df.groupby(group).agg({'start': 'first', 'end': 'last'})
output:
start end
0 100 580
1 750 910
2 920 940
3 1000 1300
intermediate group:
0 0
1 0
2 0
3 1
4 2
5 3
6 3
dtype: int64

Pandas Pivot Table Based on Specific Column Value [duplicate]

This question already has answers here:
transform dataframe according to index and labels
(2 answers)
Closed 1 year ago.
I need to pivot my data in a df like shown below based on a specific date in the YYMMDD and HHMM column "20180101 100". This specific date represents a new category of data with equal amounts of rows. I plan on replacing the repeating column names in the output with unique names. Suppose my data looks like this below.
YYMMDD HHMM BestGuess(kWh)
0 20180101 100 20
1 20180101 200 70
0 20201231 2100 50
1 20201231 2200 90
2 20201231 2300 70
3 20210101 000 40
4 20180101 100 5
5 20180101 200 7
6 20201231 2100 2
7 20201231 2200 3
8 20201231 2300 1
9 20210101 000 4
I need the new df (dfpivot) to look like this:
YYMMDD HHMM BestGuess(kWh) BestGuess(kWh)
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 000 40 4

Does this suffice?
cols = ['YYMMDD', 'HHMM']
df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
BestGuess(kWh)
0 1
YYMMDD HHMM
20180101 100 20 5
200 70 7
20201231 2100 50 2
2200 90 3
2300 70 1
20210101 0 40 4
More fully baked
cols = ['YYMMDD', 'HHMM']
temp = df.set_index([*cols, df.groupby(cols).cumcount()]).unstack()
temp.columns = [f'{l0} {l1}' for l0, l1 in temp.columns]
temp.reset_index()
YYMMDD HHMM BestGuess(kWh) 0 BestGuess(kWh) 1
0 20180101 100 20 5
1 20180101 200 70 7
2 20201231 2100 50 2
3 20201231 2200 90 3
4 20201231 2300 70 1
5 20210101 0 40 4

how to cap pandas column with mean values with some conditions

I have following dataframe in pandas
ID Quantity Rate Product
1 10 70 MS
2 10 70 MS
3 100 70 MS
4 10 100 MS
5 700 65 HS
6 1100 65 HS
7 700 100 HS
I want to cap values with mean values in Quantity and Rate For MS if Quantity is greater than 100 and Rate is greater than 99 then it should be replaced by mean and For HS if Quantity is greater than 1000 and Rate is greater than 99 then it should be replaced by mean.
I am using following way
mean_MS = df['Quantity'][(df['Product'] == 'MS') and (df['Quantity'] < 100)].mean()
But it does not work.
My desired dataframe would be
ID Quantity Rate Product
1 10 70 MS
2 10 70 MS
3 10 70 MS
4 10 70 MS
5 700 65 HS
6 700 65 HS
7 700 65 HS

one way to solve this,
m1=df['Product']=='MS'
m2=(df['Quantity']>=100)|(df['Rate']>99)
df.loc[m1&m2,'Quantity']=df[m1&(df['Quantity']<100)]['Quantity'].mean()
df.loc[m1&m2,'Rate']=df[m1&(df['Rate']<99)]['Rate'].mean()
m3=df['Product']=='HS'
m4=(df['Quantity']>=1000)|(df['Rate']>99)
df.loc[m3&m4,'Quantity']=df[m3&(df['Quantity']<1000)]['Quantity'].mean()
df.loc[m3&m4,'Rate']=df[m3&(df['Rate']<99)]['Rate'].mean()
O/P:
ID Quantity Rate Product
0 1 10.0 70.0 MS
1 2 10.0 70.0 MS
2 3 10.0 70.0 MS
3 4 10.0 70.0 MS
4 5 700.0 65.0 HS
5 6 700.0 65.0 HS
6 7 700.0 65.0 HS
Explanation:
divide your problem into two sub models one is MS and another one is HS for both contains same logic but differs in quantity value.
first you have to change value only for MS so flag that in m1 then if Quantity is greater than or equal to 100 or Rate is greater than 99 replace the mean value from the df where df contains required MS row and clearing out the values where our condition exceeds.
repeat the same logic for Rate.
repeat step 2 and 3 for HS too where Quantity condition modified from 100 to 1000.

IIUC , you can also try the below:
val1= df.loc[df.Product.eq('MS'),['Quantity','Rate']].mode().values
#array([[10, 70]], dtype=int64)
val2= df.loc[df.Product.eq('HS'),['Quantity','Rate']].mode().values
#array([[700, 65]], dtype=int64)
df.loc[df.Product.eq('MS')&df.Quantity.ge(100)|df.Product.eq('MS')&df.Rate.gt(99),['Quantity','Rate']] = val1
df.loc[df.Product.eq('HS')&df.Quantity.ge(1000)|df.Product.eq('HS')&df.Rate.gt(99),['Quantity','Rate']] = val2
print(df)
ID Quantity Rate Product
0 1 10 70 MS
1 2 10 70 MS
2 3 10 70 MS
3 4 10 70 MS
4 5 700 65 HS
5 6 700 65 HS
6 7 700 65 HS

Add a column in dataframe conditionally from values in other dataframe python

i have a table in pandas df
id product_1 count
1 100 10
2 200 20
3 100 30
4 400 40
5 500 50
6 200 60
7 100 70
also i have another table in dataframe df2
product score
100 5
200 10
300 15
400 20
500 25
600 30
700 35
i have to create a new column score in my first df, taking values of score from df2 with respect to product_1.
my final output should be. df =
id product_1 count score
1 100 10 5
2 200 20 10
3 100 30 5
4 400 40 20
5 500 50 25
6 200 60 10
7 100 70 5
Any ideas how to achieve it?

Use map:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
print (df)
id product_1 count score
0 1 100 10 5
1 2 200 20 10
2 3 100 30 5
3 4 400 40 20
4 5 500 50 25
5 6 200 60 10
6 7 100 70 5
Or merge:
df = pd.merge(df,df2, left_on='product_1', right_on='product', how='left')
print (df)
id product_1 count product score
0 1 100 10 100 5
1 2 200 20 200 10
2 3 100 30 100 5
3 4 400 40 400 20
4 5 500 50 500 25
5 6 200 60 200 10
6 7 100 70 100 5
EDIT by comment:
df['score'] = df['product_1'].map(df2.set_index('product')['score'].to_dict())
df['final_score'] = (df['count'].mul(0.6).div(df.id)).add(df.score.mul(0.4))
print (df)
id product_1 count score final_score
0 1 100 10 5 8.0
1 2 200 20 10 10.0
2 3 100 30 5 8.0
3 4 400 40 20 14.0
4 5 500 50 25 16.0
5 6 200 60 10 10.0
6 7 100 70 5 8.0

How to do calculation on pandas dataframe that require processing multiple rows?

I have a dataframe from which I need to calculate a number of features from. The dataframe df looks something like this for a object and an event:
id event_id event_date age money_spent rank
1 100 2016-10-01 4 150 2
2 100 2016-09-30 5 10 4
1 101 2015-12-28 3 350 3
2 102 2015-10-25 5 400 5
3 102 2015-10-25 7 500 2
1 103 2014-04-15 2 1000 1
2 103 2014-04-15 3 180 6
From this I need to know for each id and event_id (basically each row), what was the number of days since the last event date, total money spend upto that date, avg. money spent upto that date, rank in last 3 events etc.
What is the best way to work with this kind of problem in pandas where for each row I need information from all rows with the same id before the date of that row, and so the calculations? I want to return a new dataframe with the corresponding calculated features like
id event_id event_date days_last_event avg_money_spent total_money_spent
1 100 2016-10-01 278 500 1500
2 100 2016-09-30 361 196.67 590
1 101 2015-12-28 622 675 1350
2 102 2015-10-25 558 290 580
3 102 2015-10-25 0 500 500
1 103 2014-04-15 0 1000 1000
2 103 2014-04-15 0 180 180

I came up with the following solution:
df1= df.sort_values(by="event_date",ascending = False)
g = df1.groupby(by=["id"])
df1["total_money_spent","count"]= g.agg({"money_spent":["cumsum","cumcount"]})
df1["avg_money_spent"]=df1["total_money_spent"]/(df1["count"]+1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas totalling balances with date timeline from multiple sheets - python

Related

Find overlapped rows in Pandas Data Frame

Pandas Pivot Table Based on Specific Column Value [duplicate]

how to cap pandas column with mean values with some conditions

Add a column in dataframe conditionally from values in other dataframe python

How to do calculation on pandas dataframe that require processing multiple rows?

Categories

Resources