I have an issue to define a constraint on Pyomo.
I have the following DF:
Group
Product
Product type
is_something
ID
value
date
A
ABSP
0.10
1
1
15
2022-06-01
A
ABSL
0.10
1
1
15
2022-06-01
A
ABSB0
0.10
0
1
15
2022-06-01
A
ABSB1
0.15
1
2
2
2022-06-01
A
ABSB0
0.10
0
2
2
2022-06-01
A
ABSP
0.10
1
1
10
2022-09-15
A
ABSL
10
1
1
10
2022-09-15
A
ABSB0
0.10
0
1
15
2022-09-15
A
ABSB1
0.15
1
2
2
2022-09-15
A
ABSB0
0.10
0
2
2
2022-09-15
This dataframe represents some events.
The column is_something tells me if I could play the event alone or with other products which would have the same id.
The constraint that I have:
For a given ID, force event with is something = 1 to play with all product with the same id with the group, product, product_type, ID, value and date
I have defined:
VAR = Var(describe, within=Binary)
where describe is the combination of Group, Product, Product type, value, date.
The mathematics constraints could be Number of promotions, where is_something = 1 <= Number of events of the Group?
Related
What I want to do is add the ratios in the 1st table based on the correct child in the 2nd table.
So for example for the 1st observation I want to do 0.52 (1st child 16-17)+0.84 (2nd child 11-13)+0.78 (3rd child 0-3)=2.14 and create a new column for those values.
There are no observations with more than 1 child in any age range. The "Child_18-older_18" and "Pregnant" columns should be seen as someone with an age of 16-17 and 0-3 in the ratio table, respectively. With regards to the 2nd table, the entire dataframe consists of 4000 observations. These 5 observations were picked randomly
Age
First_child_ratio
Second_child_ratio
Third_child_ratio
Fourth_child_ratio
0-3
1.0
0.72
0.78
0.66
4-6
0.83
0.6
0.65
0.54
7-10
0.77
0.69
0.73
0.59
11-13
0.88
0.84
0.87
0.86
14-15
0.52
0.52
0.68
0.68
16-17
0.52
0.52
0.52
0.52
Pregnant
Child_0-3
Child_4-6
Child_7-10
Child_11-13
Child_14-15
Child_16-17
Child_18-older_18
No_Child
0
1
0
0
1
0
1
0
0
0
1
0
1
0
0
0
0
0
0
0
1
0
1
0
1
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
I have the following dataset:
my_df = pd.DataFrame({'id':[1,2,3,4,5],
'type':['corp','smb','smb','corp','mid'],
'sales':[34567,2190,1870,22000,10000],
'sales_roi':[.10,.21,.22,.15,.16],
'sales_pct':[.38,.05,.08,.30,.20],
'sales_ln':[4.2,2.1,2.0,4.1,4],
'cost_pct':[22000,1000,900,14000,5000],
'flag':[0,1,0,1,1],
'gibberish':['bla','ble','bla','ble','bla'],
'tech':['lnx','mst','mst','lnx','mc']})
my_df['type'] = pd.Categorical(my_df.type)
my_df
id type sales sales_roi sales_pct sales_ln cost_pct flag gibberish tech
0 1 corp 34567 0.10 0.38 4.2 22000 0 bla lnx
1 2 smb 2190 0.21 0.05 2.1 1000 1 ble mst
2 3 smb 1870 0.22 0.08 2.0 900 0 bla mst
3 4 corp 22000 0.15 0.30 4.1 14000 1 ble lnx
4 5 mid 10000 0.16 0.20 4.0 5000 1 bla mc
And I want to filter out all variables who end in "_pct" or "_ln" or are equal to "gibberish" or "tech". This is what I have tried:
df_selected = df.loc[:, ~my_df.columns.str.endswith('_pct') &
~my_df.columns.str.endswith('_ln') &
~my_df.columns.str.contains('gibberish','tech')]
But it returns me an unwanted column ("tech"):
id type sales sales_roi flag tech
0 1 corp 34567 0.10 0 lnx
1 2 smb 2190 0.21 1 mst
2 3 smb 1870 0.22 0 mst
3 4 corp 22000 0.15 1 lnx
4 5 mid 10000 0.16 1 mc
This is the expected result:
id type sales sales_roi flag
0 1 corp 34567 0.10 0
1 2 smb 2190 0.21 1
2 3 smb 1870 0.22 0
3 4 corp 22000 0.15 1
4 5 mid 10000 0.16 1
Please consider that I have to deal with hundreds of variables and this is just an example of what I need.
Currently, what you are doing will return every column because of how the conditions are written. endswith will accept tuples so just put all the columns you are looking for in a single tuple and then filter
my_df[my_df.columns[~my_df.columns.str.endswith(('_pct','_ln','gibberish','tech'))]]
id type sales sales_roi flag
0 1 corp 34567 0.10 0
1 2 smb 2190 0.21 1
2 3 smb 1870 0.22 0
3 4 corp 22000 0.15 1
4 5 mid 10000 0.16 1
I would do it like this:
criterion = ["_pct", "_ln", "gibberish", "tech"]
for column in my_df:
for criteria in criterion:
if criteria in column:
my_df = my_df.drop(column, axis=1)
Ofcourse you can change the if statement in line 3 to endswith or something of your choice.
I have got a dataframe of several hundred thousand rows. Which is of the following format:
time_elapsed cycle
0 0.00 1
1 0.50 1
2 1.00 1
3 1.30 1
4 1.50 1
5 0.00 2
6 0.75 2
7 1.50 2
8 3.00 2
I want to create a third column that will give me the percentage of each time instance that the row is of the cycle (until the next time_elapsed = 0). To give something like:
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 75
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
I'm not fussed about the number of decimal places, I've just excluded them for ease here.
I started going along this route, but I keep getting errors.
data['percentage'] = data['time_elapsed'].sub(data.groupby(['cycle'])['time_elapsed'].transform(lambda x: x*100/data['time_elapsed'].max()))
I think it's the lambda function causing errors, but I'm not sure what I should do to change it. Any help is much appreciated :)
Use Series.div for division instead sub for subtract, then solution is simplify - get only max per groups, multiple by Series.mul, if necessary Series.round and last convert to integers by Series.astype:
s = data.groupby(['cycle'])['time_elapsed'].transform('max')
data['percentage'] = data['time_elapsed'].div(s).mul(100).round().astype(int)
print (data)
time_elapsed cycle percentage
0 0.00 1 0
1 0.50 1 33
2 1.00 1 67
3 1.30 1 87
4 1.50 1 100
5 0.00 2 0
6 0.75 2 25
7 1.50 2 50
8 3.00 2 100
I have the following dataset, that I would like to rank by region, and also by store type (within each region).
Is there a slick way of coding these 2 columns in python?
Data:
print (df)
Region ID Location store Type ID Brand share
0 1 Warehouse 1.97
1 1 Warehouse 0.24
2 1 Super Centre 0.21
3 1 Warehouse 0.13
4 1 Mini Warehouse 0.10
5 1 Super Centre 0.07
6 1 Mini Warehouse 0.04
7 1 Super Centre 0.02
8 1 Mini Warehouse 0.02
9 10 Warehouse 0.64
10 10 Mini Warehouse 0.18
11 10 Warehouse 0.13
12 10 Warehouse 0.09
13 10 Super Centre 0.07
14 10 Mini Warehouse 0.03
15 10 Mini Warehouse 0.02
16 10 Super Centre 0.02
Use GroupBy.cumcount:
df['RegionRank'] = df.groupby('Region ID')['Brand share'].cumcount() + 1
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = df.groupby(cols)['Brand share'].cumcount() + 1
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 9 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3
16 10 Super Centre 0.02 8 2
Or GroupBy.rank, but it return same values for same categories:
df['RegionRank'] = (df.groupby('Region ID')['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
cols = ['Location store Type ID', 'Region ID']
df['StoreTypeRank'] = (df.groupby(cols)['Brand share']
.rank(method='dense', ascending=False)
.astype(int))
print (df)
Region ID Location store Type ID Brand share RegionRank StoreTypeRank
0 1 Warehouse 1.97 1 1
1 1 Warehouse 0.24 2 2
2 1 Super Centre 0.21 3 1
3 1 Warehouse 0.13 4 3
4 1 Mini Warehouse 0.10 5 1
5 1 Super Centre 0.07 6 2
6 1 Mini Warehouse 0.04 7 2
7 1 Super Centre 0.02 8 3
8 1 Mini Warehouse 0.02 8 3
9 10 Warehouse 0.64 1 1
10 10 Mini Warehouse 0.18 2 1
11 10 Warehouse 0.13 3 2
12 10 Warehouse 0.09 4 3
13 10 Super Centre 0.07 5 1
14 10 Mini Warehouse 0.03 6 2
15 10 Mini Warehouse 0.02 7 3 <-same value .02
16 10 Super Centre 0.02 7 2 <-same value .02
I have the following df: visitor can make multiple visits, and the number of page views is recorded in each visit.
df = pd.DataFrame({'visitor_id':[1,1,2,1],'visit_id':[1,2,1,3], 'page_views':[10,20,30,40]})
page_views visit_id visitor_id
0 10 1 1
1 20 2 1
2 30 1 2
3 40 3 1
What I need is to create an additional column called weight, which will diminish with a certain parameter. For example, if this parameter is 1/2, the newest visit has a weight of 1, 2nd newest visit a weight of 1/2, 3rd is 1/4 and so on.
E.g. I want my dataframe to look like:
page_views visit_id visitor_id weight
0 10 1(oldest) 1 0.25
1 20 2 1 0.5
2 30 1(newest) 2 1
3 40 3(newest) 1 1
Then I will be able to group using their weight e.g.
df.groupby(['visitor_id']).Weight.sum() to get weighted page views group by.
Doesnt work as expected
df = pd.DataFrame({'visitor_id':[1,1,2,2,1,1],'visit_id':[5,6,1,2,7,8], 'page_views':[10,20,30,30,40,50]})
df['New']=df.groupby('visitor_id').visit_id.transform('max') - df.visit_id
df['weight'] = pd.Series([1/2]*len(df)).pow(df.New.values)
df
page_views visit_id visitor_id New weight
0 10 5 1 3 0
1 20 6 1 2 0
2 30 1 2 1 0
3 30 2 2 0 1
4 40 7 1 1 0
5 50 8 1 0 1
Is this what you need ?
df.groupby('visitor_id').visit_id.apply(lambda x : 1*1/2**(max(x)-x))
Out[1349]:
0 0.25
1 0.50
2 1.00
3 1.00
Name: visit_id, dtype: float64
Maybe try this
df['New']=df.groupby('visitor_id').visit_id.transform('max')-df.visit_id
pd.Series([1/2]*len(df)).pow(df.New.values)
Out[45]:
0 0.25
1 0.50
2 1.00
3 1.00
Name: New, dtype: float64