How to calculate when inventory will run out using pandas? - python

Suppose I have a DataFrame like so:
Item Check Date Inventory
Apple 1/1/2020 50
Banana 1/1/2020 80
Apple 1/2/2020 75
Banana 1/2/2020 300
Apple 2/1/2020 100
Apple 2/2/2020 98
Banana 2/2/2020 341
Apple 2/3/2020 95
Banana 2/3/2020 328
Apple 2/4/2020 90
Apple 2/5/2020 85
Banana 2/5/2020 325
I want to find the average rate of change in the inventory for a given item starting from the max inventory count, then use that to compute what day the inventory will reach zero.
So for apples it would be starting from 2/1: 2+3+5+5/4 = 3.75, similarly for bananas starting from 2/2 13+3/2 = 8.
Since there are different items, I have used:
apples = df[df["Item"] == "apples"]
to get a dataframe for just the apples, then used:
apples["Inventory"].idxmax()
to find the row with the max inventory count.
However, this gives me the row label of the row for the original dataframe. So I'm not sure where to go from here since my plan was to then get the date off the row with the max inventory count, then ignore any dates before that.

You can still use the idxmax but with transform
s=df[df.index>=df.groupby('Item').Inventory.transform('idxmax')]
out=s.groupby('Item')['Inventory'].apply(lambda x : -x.diff().mean())
Item
Apple 3.75
Banana 8.00
Name: Inventory, dtype: float64

Related

How to get consecutive pairs in pandas data frame and find the date difference for valid pairs

Input Data:
sn
fruits
Quality
Date
1
Apple
A
2022-09-01
2
Apple
A
2022-08-15
3
Apple
A
2022-07-15
4
Apple
B
2022-06-01
5
Apple
A
2022-05-15
6
Apple
A
2022-04-15
7
Banana
A
2022-08-15
8
Orange
A
2022-08-15
Get the average date diff for each type of fruit, only if quality=A and there are consecutive record with quality A.
If there are three rows of A quality only first 2 make valid pair. Third one is not valid pair as 4th record is quality=B
So in above data we have 2 valid pairs for Apple 1st pair= (1,2) = 15days date diff and 2nd pair = (5,6) = 15days diff so avg for apple is 15days
Expected output
fruits
avg time diff
Apple
15 days
Banana
null
Orange
null
How can I do this without using any looping in pandas dataframe?

Find how often products are sold together in Python DataFrame

I have a dataframe that is sturctured like below, but with 300 different products and about 20.000 orders.
Order
Avocado
Mango
Chili
1546
500
20
0
861153
200
500
5
1657446
500
20
0
79854
200
500
1
4654
500
20
0
74654
0
500
800
I found out what combinations are often together with this code (abbrivated here to 3 products).
size = df.groupby(['AVOCADO', 'MANGO', 'CHILI'], as_index=False).size().sort_values(by=['size'], ascending=False)
Now I want to know per product how often it is bought solo and how often with other products.
Something like this would be my ideal output (fictional numbers) where the percentage shows what percentage of total orders with that product had the other products as well:
Product
Avocado
Mango
Chili
AVOCADO
100%
20 %
1%
MANGO
20 %
100%
3%
CHILI
20%
30%
100%
First we replace actual quantities by 1s and 0s to indicate if the products were in the order or not:
df2 = 1*(df.set_index('Order') > 0)
Then I think the easiest is just to use matrix algebra wrapped into a dataframe. Also given the size of your data it is a good idea to go directly to numpy rather than try to manipulate the dataframe.
For actual numbers of orders that contain (product1,product2), we can do
df3 = pd.DataFrame(data = df2.values.T#df2.values, columns = df2.columns, index = df2.columns)
df3 looks like this:
Avocado Mango Chili
------- --------- ------- -------
Avocado 5 5 2
Mango 5 6 3
Chili 2 3 3
eg there are 2 orders that contain Avocado and Chili
If you want percentages as in your question, we need to divide by the total number of orders with the given product. Again I htink going to numpy directly is best:
df4 = pd.DataFrame(data = ( (df2.values/np.sum(df2.values,axis=0)).T#df2.values), columns = df2.columns, index = df2.columns)
df4 is:
Avocado Mango Chili
------- --------- ------- -------
Avocado 1 1 0.4
Mango 0.833333 1 0.5
Chili 0.666667 1 1
the 'main' product is in the index and its companion in column so for example for products with Mango, 0.833333 had avocado and 0.5 had Chili

Counting non-filtered value_counts along with filtered values in pandas

Assuming that I have a dataframe of pastries
Pastry Flavor Qty
0 Cupcake Cheese 3
1 Cakeslice Chocolate 2
2 Tart Honey 2
3 Croissant Raspberry 1
And I get the value count of a specific flavor per pastry
df[df['Flavor'] == 'Cheese']['Pastry'].value_counts()
Cupcake 4
Tart 4
Cakeslice 3
Turnover 3
Creampie 2
Danish 2
Bear Claw 2
Then to get the percentile of that flavor qty, I could do this
df[df['Flavor'] == 'Cheese']['Pastry'].value_counts().describe(percentiles=[.75, .85, .95])
And I'd get something like this (from full dataframe)
count 35.00000
mean 1.485714
std 0.853072
min 1.000000
50% 1.000000
75% 2.000000
85% 2.000000
95% 3.300000
max 4.000000
Where the total different pastries that are cheese flavored is 35, so the total cheese qty is distributed amongst those 35 pastries. The mean of qty is 1.48, max qty is 4 (cupcake and tart) etc, etc.
What I want to do is bring that 95th percentile down by counting all other values which are not 'Cheese' in the flavor column, however value_counts() is only counting the ones that are 'Cheese' because I filtered the dataframe. How can I also count the non Cheese rows, so that my percentiles will go down and will represent the distribution of Cheese total in the entire dataframe?
This is an example output:
Cupcake 4
Tart 4
Cakeslice 3
Turnover 3
Creampie 2
Danish 2
Bear Claw 2
Swiss Roll 1
Baklava 0
Cannoli 0
Where the non-cheese flavor pastries are being included with 0 as qty, from there I can just get the percentiles and they will be reduced since there are 0 values now diluting them.
I decided to go and try the long way to try and solve this question and my result gave me the same answer as this question
Here is the long way, in case anyone is curious.
pastries = {}
for p in df['Pastry'].unique():
pastries[p] = df[(df['Flavor'] == 'Cheese') & (df['Pastry'] == p)]['Pastry'].count()
newdf = pd.DataFrame.from_dict(pastries.items())
newdf.describe(percentiles=[.75, .85, .95])

Pandas group and join

I am new to pandas. I want to analysis the following case. Let say, A fruit market is giving the prices of the fruits daily the time from 18:00 to 22:00. For every half an hour they are updating the price of the fruits between the time lab. Consider the market giving the prices of the fruits at 18:00 as follows,
Fruit Price
Apple 10
Banana 20
After half an hour at 18:30, the list has been updated as follows,
Fruit Price
Apple 10
Banana 21
Orange 30
Grapes 25
Pineapple 65
I want to check has the prices of the fruits been changed of recent one[18:30] with the earlier one[18:00].
Here I want to get the result as,
Fruit 18:00 18:30
Banana 20 21
To solve this I am thinking to do the following,
1) Add time column in the two data frames.
2) Merge the tables into one.
3) Make a Pivot table with Index Fruit name and Column as ['Time','Price'].
I don't know how to get intersect the two data frames of grouped by Time. How to get the common rows of the two Data Frames.
You dont need to pivot in this case, we can simply use merge and use suffixes argument to get the desired results:
df_update = pd.merge(df, df2, on='Fruit', how='outer', suffixes=['_1800h', '_1830h'])
Fruit Price_1800h Price_1830h
0 Apple 10.0 10.0
1 Banana 20.0 21.0
2 Orange NaN 30.0
3 Grapes NaN 25.0
4 Pineapple NaN 65.0
Edit
Why are we using the outer argument? We want to keep all the new data that is updated in df2. If we use inner for example, we will not get the updated fruits, like below. Unless this is the desired output by OP, which is not clear in this case.
df_update = pd.merge(df, df2, on='Fruit', how='inner', suffixes=['_1800h', '_1830h'])
Fruit Price_1800h Price_1830h
0 Apple 10 10.0
1 Banana 20 21.0
If Fruit is the index of your data frame the following code should work. The Idea is to return rows with inequality:
df['1800'] = df1['Price']
df['1830'] = df2['Price']
print(df.loc[df['1800'] != df1['1830']])
You can also use datetime in your column heading.

Using a Pandas DataFrame as Lookup

I have 2 pandas DataFrames, this one:
item inStock description
Apples 10 a juicy treat
Oranges 34 mediocre at best
Bananas 21 can be used as phone prop
<...many other fruits...>
Kiwi 0 too fuzzy
and a lookup table with only a subset of the items above:
item Price
Apples 1.99
Oranges 6.99
I would like to scan through the first table and fill in a price column for the DataFrame when the fruit in the first DataFrame matches the fruit in the second:
item inStock description Price
Apples 10 a juicy treat 1.99
Oranges 34 mediocre at best 6.99
Bananas 21 can be used as phone prop
<...many other fruits...>
Kiwi 0 too fuzzy
I've looked at examples with the built-in lookup function, as well as using a where-in type function but I cannot seem to get the syntax to work. Can someone help me out?
import pandas as pd
df_item= pd.read_csv('Item.txt')
df_price= pd.read_csv('Price.txt')
df_final=pd.merge(df_item,df_price ,on='item',how='left' )
print df_final
output
item inStock description Price
0 Apples 10 a juicy treat 1.99
1 Oranges 34 mediocre at best 6.99
2 Bananas 21 can be used as phone prop NaN
3 Kiwi 0 too fuzzy NaN

Categories

Resources