What is the easiest way to convert the following ascending data frame:
start end
0 100 500
1 400 700
2 450 580
3 750 910
4 920 940
5 1000 1200
6 1100 1300
into
start end
0 100 700
1 750 910
2 920 940
3 1000 1300
You may notice that rows 0:3 and 5:7 were merged, because these rows overlap or one row is subpart of another: actually, they have only one start and end.
Use a custom group with shift to identify the overlapping intervals and keep the first start and last end (or min/max if you prefer):
group = df['start'].gt(df['end'].shift()).cumsum()
out = df.groupby(group).agg({'start': 'first', 'end': 'last'})
output:
start end
0 100 580
1 750 910
2 920 940
3 1000 1300
intermediate group:
0 0
1 0
2 0
3 1
4 2
5 3
6 3
dtype: int64
I have seen a lot of SUMIFS question being answered here but is very different from the one I need.
1st Trade data frame contains transaction id and C_ID
transaction C_ID
1 101
2 103
3 104
4 101
5 102
6 104
2nd Customer data frame contains C_ID, On/Off, Amount
C_ID On/Off Amount
102 On 320
101 On 400
101 On 200
103 On 60
104 Off 80
104 On 100
So i want to calculate the Amount based on the C_ID with a condition on column 'On/Off' in Customer data frame. The resulting trade data frame should be
transaction C_ID Amount
1 101 600
2 103 60
3 104 100
4 101 600
5 102 320
6 104 100
So here’s the formula in EXCEL on how Amount are calculated
=SUMIFS(Customer.Amount, Customer.C_ID = Trade.C_ID, Customer.On/Off = On)
So i want to replicate this particular formula in Python using Pandas
You can use groupby() on filtered data to compute the sum and map to assign new column to transaction data.
s = df2[df2['On/Off']=='On'].groupby('C_ID')['Amount'].sum()
df1['Amount'] = df1['C_ID'].map(s)
We do filter groupby + reindex assign
df1['Amount']=df2.loc[df2['On/Off']=='On'].groupby(['C_ID']).Amount.sum().reindex(df1.C_ID).tolist()
df1
Out[340]:
transaction C_ID Amount
0 1 101 600
1 2 103 60
2 3 104 100
3 4 101 600
4 5 102 320
5 6 104 100
I have three sheets inside one excel spreadsheet. I am trying to obtain the output listed below, or something close to it. The desired outcome is to know when there will be a shortage so that I can attempt to re-actively order and prevent the shortage. All of these, except for the output, is on one excel file. Each are different sheets. How hard will this be to achieve, is this possible? Note that in all sheets listed, there are tons of other data columns so positional references to columns may be needed, or using iloc to call upon columns by name.
instock sheet
product someother datapoint qty
5.25 1 2 100
5.25 1 3 200
6 2 1 50
6 4 1 500
ordered
product something ordernum qty date
5 1/4 abc 52521 50 07/01/2019
5 1/4 ddd 22911 100 07/28/2019
6 eeee 72944 10 07/5/2019
promised
product order qty date
5 1/4 456 300 06/12/2019
5 1/4 789 50 06/20/2019
5 1/4 112 50 07/20/2019
6 113 800 07/22/2019
5 1/4 144 50 07/28/2019
9 155 100 08/22/2019
Output
product date onhand qtyordered commited balance shortage
5.25 06/10 300 300 n
5.25 06/12 300 300 0 n
5.25 06/20 0 50 -50 y
5.25 07/01 -50 50 0 n
6 07/05 550 10 0 560 n
5.25 07/20 0 50 -50 y
6 07/22 560 0 800 -240 y
5.25 07/28 -50 100 50 0 n
9 08/22 0 0 100 -100 y
I have a pandas dataframe and I need to work out the cumulative sum for each month.
Date Amount
2017/01/12 50
2017/01/12 30
2017/01/15 70
2017/01/23 80
2017/02/01 90
2017/02/01 10
2017/02/02 10
2017/02/03 10
2017/02/03 20
2017/02/04 60
2017/02/04 90
2017/02/04 100
The cumulative sum is the trailing sum for each day i.e 01-31. However, some days are missing. The data frame should look like
Date Sum_Amount
2017/01/12 80
2017/01/15 150
2017/01/23 203
2017/02/01 100
2017/02/02 110
2017/02/03 140
2017/02/04 390
You can use if only need cumsum by months groupby with sum and then group by values of index converted to month:
df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.month).cumsum().reset_index()
print (df)
Date Amount
0 2017-01-12 80
1 2017-01-15 150
2 2017-01-23 230
3 2017-02-01 100
4 2017-02-02 110
5 2017-02-03 140
6 2017-02-04 390
But if need but months and years need convert to month period by to_period:
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()
Difference is better seen in changed df - added different year:
print (df)
Date Amount
0 2017/01/12 50
1 2017/01/12 30
2 2017/01/15 70
3 2017/01/23 80
4 2017/02/01 90
5 2017/02/01 10
6 2017/02/02 10
7 2017/02/03 10
8 2018/02/03 20
9 2018/02/04 60
10 2018/02/04 90
11 2018/02/04 100
df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.month).cumsum().reset_index()
print (df)
Date Amount
0 2017-01-12 80
1 2017-01-15 150
2 2017-01-23 230
3 2017-02-01 100
4 2017-02-02 110
5 2017-02-03 120
6 2018-02-03 140
7 2018-02-04 390
df.Date = pd.to_datetime(df.Date)
df = df.groupby('Date').Amount.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()
print (df)
Date Amount
0 2017-01-12 80
1 2017-01-15 150
2 2017-01-23 230
3 2017-02-01 100
4 2017-02-02 110
5 2017-02-03 120
6 2018-02-03 20
7 2018-02-04 270
I have two data frames. One representing when an order was placed and arrived, while the other one represents the working days of the shop.
Days are taken as days of the year. i.e. 32 = 1th February.
orders = DataFrame({'placed':[100,103,104,105,108,109], 'arrived':[103,104,105,106,111,111]})
Out[25]:
arrived placed
0 103 100
1 104 103
2 105 104
3 106 105
4 111 108
5 111 109
calendar = DataFrame({'day':['100','101','102','103','104','105','106','107','108','109','110','111','112','113','114','115','116','117','118','119','120'], 'closed':[0,1,1,0,0,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0]})
Out[21]:
closed day
0 0 100
1 1 101
2 1 102
3 0 103
4 0 104
5 0 105
6 0 106
7 0 107
8 1 108
9 1 109
10 0 110
11 0 111
12 0 112
13 0 113
14 0 114
15 1 115
16 1 116
17 0 117
18 0 118
19 0 119
20 0 120
What i want to do is to compute the difference between placed and arrived
x = orders['arrived'] - orders['placed']
Out[24]:
0 3
1 1
2 1
3 1
4 3
5 2
dtype: int64
and subtract one if any day between arrived and placed (included) was a day in which the shop was closed.
i.e. in the first row the order is placed on day 100 and arrived on day 103. the day used are 100, 101, 102, 103. the difference between 103 and 100 is 3. However, since 101 and 102 are days in which the shop is closed I want to subtract 1 for each. That is 3 -1 -1 = 1. And finally append this result on the orders df.