Fill values based on adjacent column - python

How could I create create_col? For each row, find the previous time where that fruit was mentioned and check if the wanted column was yes?
wanted fruit create_col
0 yes apple
1 pear
2 peear < last time pear was mentioned, wanted was not yes, so blank
3 apple True < last time apple was mentioned, wanted was yes, so True

df
###
wanted fruit
0 yes apple
1 pear
2 yes pear
3 apple
4 mango
5 pear
df['cum_list'] = df[df['wanted'].eq('yes')]['fruit'].cumsum()
df['cum_list'] = df['cum_list'].shift(1).ffill()
df.fillna('', inplace=True)
df['create_col'] = np.where(df.apply(lambda x: x['fruit'] in x['cum_list'], axis=1),True, '')
df.drop(columns=['cum_list'],inplace=True)
df
###
wanted fruit create_col
0 yes apple
1 pear
2 yes pear
3 apple True
4 mango
5 pear True

Related

filter df using key words

I have a df:
item_name price stock
red apple 2 2
green apple 4 1
green grape 4 3
yellow apple 1 2
purple grape 4 1
I have another df:
Key Word Min_stock
red;grape 2
The result I would like to get is:
item_name price stock
red apple 2 2
green grape 4 3
I would like to filter the first df based on the second df, for keyword, I would like to select item_name that contains either key word in Key Word cloumn.
Is there any way to acheive it?
Assuming df1 and df2 the DataFrames, you can compute a regex from the splitted and exploded df2, then extract and map the min values and filter with boolean indexing:
s = (df2.assign(kw=df2['Key Word'].str.split(';'))
.explode('kw')
.set_index('kw')['Min_stock']
)
# red 2
# grape 2
# blue 10
# apple 10
regex = '|'.join(s.index)
# 'red|grape|blue|apple'
mask = df1['item_name'].str.extract(f'({regex})', expand=False).map(s)
# 0 2
# 1 10
# 2 2
# 3 10
# 4 2
out = df1[mask.notna()&df1['stock'].ge(mask)]
output:
item_name price stock
0 red apple 2 2
2 green grape 4 3
NB. for generalization, I used a different df2 as input:
Key Word Min_stock
red;grape 2
blue;apple 10

How do I create a table to match based on different columns values ? If statements?

I have a dataset and I am looking to see if there is a way to match data based on col values.
col-A col-B
Apple squash
Apple lettuce
Banana Carrot
Banana Carrot
Banana Carrot
dragon turnip
melon potato
melon potato
pear potato
Match
if col A matches another col a and col b doesn't match
if col B matches another col B and col a doesn't match
col-A col-B
Apple squash
Apple lettuce
melon potato
melon potato
pear potato
edit fixed typo
edit2 fixed 2nd typo
So, if I understand well, you want to select each rows, such that grouping for colA (resp. colB) then colB (resp. colA) lead to more than one group.
I can advice :
grA = df2.groupby("colA").filter(lambda x : x.groupby("colB").ngroups > 1)
grB = df2.groupby("colB").filter(lambda x : x.groupby("colA").ngroups > 1)
Leading to :
grA
colA colB
0 Apple squash
1 Apple lettuce
and
grB
colA colB
6 melon potato
7 melon potato
8 pear potato
Merging the two dataframes will lead to the desired ouput.
IIUC, you need to compute two masks to identify which group has a unique match with the other values:
m1 = df.groupby('col-B')['col-A'].transform('nunique').gt(1)
m2 = df.groupby('col-A')['col-B'].transform('nunique').gt(1)
out = df[m1|m2]
Output:
col-A col-B
0 Apple squash
1 Apple lettuce
6 melon potato
7 melon potato
8 pear potato
You can also get the unique/exclusive pairs with:
df[~(m1|m2)]
col-A col-B
2 Banana Carrot
3 Banana Carrot
4 Banana Carrot
5 Pear Cabbage

How to find out for each product, how many were sold in Pandas DataFrame

I have a pandas dataframe like this:
Name Product Amount
0 Bob Apple 1
1 Bob Banana 2
2 Jessica Orange 3
3 Jessica Banana 4
4 Jessica Tomato 3
5 Mary Banana 2
6 John Apple 3
7 John Grape 1
import pandas as pd
data = [('Bob','Apple',1), ('Bob','Banana',2), ('Jessica','Orange',3),
('Jessica','Banana',4),('Jessica','Tomato',3), ('Mary','Banana',2),
('John','Apple',3),('John','Grape',1)]
df = pd.DataFrame(data,columns=['Name','Product','Amount'])
What I have done so far:
l = []
count=0
for i in range(0,8):
row = df.iloc[i]
if row.Product not in l:
l.append(row.Product)
Now, l contains all the unique values in the Product column, but I need the total amount as well.
How can I find out for each product how many items were sold (for example, 4 units of Apple were sold)?
You're looking for .groupby() function:
print( df.groupby('Product')['Amount'].sum() )
Prints:
Product
Apple 4
Banana 8
Grape 1
Orange 3
Tomato 3
Name: Amount, dtype: int64
out = df.groupby('Product')['Amount'].sum()
print('{} units of Apple were sold.'.format(out.loc['Apple']))
Prints:
4 of Apple were sold.

combine columns containing empty strings into one column in python pandas

I have a dataframe like below.
df=pd.DataFrame({'apple': [1,0,1,0],
'red grape': [1,0,0,1],
'banana': [0,1,0,1]})
I need to create another column with combine these columns and separate with ';', like below:
fruits apple red grape banana
0 apple;red grape 1 1 0
1 banana 0 0 1
2 apple 1 0 0
3 red grape;banana 0 1 1
what I did was I converted 1/0 to string/empty string, then concatenate the columns
df['apple'] = df.apple.apply(lambda x: 'apple' if x==1 else '')
df['red grape'] = df['red grape'].apply(lambda x: 'red grape' if x==1 else '')
df['banana'] = df['banana'].apply(lambda x: 'banana' if x==1 else '')
df['fruits'] = df['apple']+';'+df['red grape']+';'+df['banana']
apple red grape banana fruits
0 apple red grape apple;red grape;
1 banana ;;banana
2 apple apple;;
3 red grape banana ;red grape;banana
The separators all screwed up because of the empty strings. Also I want the solution to be more general. For example, I might have lots of such columns to combine. Do not want to hardcode eveything...
Does anyone know the best way to do this? Thanks a lot.
Use DataFrame.insert for first column with DataFrame.dot for matrix multiplication with separator and last remove separator from right side by Series.str.rstrip:
df.insert(0, 'fruits', df.dot(df.columns + ';').str.rstrip(';'))
print (df)
fruits apple red grape banana
0 apple;red grape 1 1 0
1 banana 0 0 1
2 apple 1 0 0
3 red grape;banana 0 1 1

Python Pandas Data Frame Inserting Many Arbitrary Values

Let's say I have a data frame that looks like this:
A
0 Apple
1 orange
2 pear
3 apple
For index values 4-1000, I want all of them to say "watermelon".
Any suggestions?
Reindex and fill NaNs:
df.reindex(np.r_[:1000]).fillna('watermelon')
Or,
df = df.reindex(np.r_[:1000])
df.iloc[df['A'].last_valid_index() + 1:, 0] = 'watermelon' # df.iloc[4:, 0] = "..."
A
0 Apple
1 orange
2 pear
3 apple
4 watermelon
5 watermelon
...
999 watermelon

Categories

Resources