:)
I have the following DataFrame
Fruit Type
Fruit1
Fruit2
Fruit3
Fruit4
Berries
Raspberry
Blueberry
Passionfruit
Kiwi
Raspberry
Blueberry
Passionfruit
Kiwi
Citrus
Grapefruit
Mandarins
Lemon
Lime
Grapefruit
Mandarins
Lemon
Lime
Melons
Rockmelon
Honeydew
Muskmelon
Zucchini
Rockmelon
Honeydew
Muskmelon
Zucchini
I'm trying to get the Fruit Types to span across all the other columns like titles across their fruit families. So just 1 cell spanning multiple columns, like when you drag multiple cells horizontally in excel and click merge cells.
I don't know how to show the output in Stack but I want it to look like 'Berries' is one cell which spans columns Fruit 1, Fruit2, Fruit3 and Fruit 4 and the same for Citrus and Melons and then I can remove the Fruit Type Column entirely.
Related
I have a dataframe with information, where the rows are not related to eachother:
Fruits Vegetables Protein
1 Apple Spinach Beef
2 Banana Cucumber Chicken
3 Pear Carrot Pork
I essentially just want to create a pandas series with all of that information, I want it to look like this:
All Foods
1 Apple
2 Banana
3 Pear
4 Spinach
5 Cucumber
6 Carrot
7 Beef
8 Chicken
9 Pork
How can I do this in pandas?
Dump into numpy and create a new dataframe:
out = df.to_numpy().ravel(order='F')
pd.DataFrame({'All Foods' : out})
All Foods
0 Apple
1 Banana
2 Pear
3 Spinach
4 Cucumber
5 Carrot
6 Beef
7 Chicken
8 Pork
Just pd.concat them together (and reset the index).
all_foods = pd.concat([foods[col] for col in foods.columns])
You can unstack the dataframe to get the values and then create a df/series:
df = pd.DataFrame({'Fruits':['Apple','Banana', 'Pear'], 'Vegetables':['Spinach', 'Carrot', 'Cucumber'], 'Protein':['Beef', 'Chicken', 'Pork']})
pd.DataFrame({'All Foods' : df.unstack().values})
This should help:
import pandas as pd
# Loading file with fruits, vegetables and protein
dataset = pd.read_csv('/fruit.csv')
# This is where you should apply your code
# Unpivoting (creating one column out of 3 columns)
df_unpivot = pd.melt(dataset, value_vars=['Fruits', 'Vegetables', 'Protein'])
# Renaming column from value to All Foods
df_finalized = df_unpivot.rename(columns={'value': 'All Foods'})
# Printing out "All Foods" column
print(df_finalized["All Foods"])
I have two data frames, I need to extract data in Column_3 of the second dataframe DF2.
Question 1: How should I create "Column_3" from "Column_1" and "Column_2" of the first dataframe?
DF1 =
Column_1 Column_2 Column_3
Red Apple small Red Apple small
Green fruit Large Green fruit Large
Yellow Banana Medium Yellow Banana Medium
Pink Mango Tiny Pink Mango Tiny
Question 2: I need to extract "n_col3" from n_col1 & n_col2 but that should be similar to the column_3 of data frame 1. (see the brackets for info of what to be extracted)
Note: If all the information of Column_3 is not available in Column_1 & Column_2 like in Row 1 & Row 3, Only that information that is available should be extracted)
DF2 =
n_col1 n_col2 n_col3
L854 fruit Charlie Green LTD Large fruit Large(Green missing Fruit Large extracted)
Red alpha 8 small Tango G250 Apple Red Apple small(all information extracted)
Mk43 Mango Beta Tiny J448 T Mango Tiny(Pink missing, Mango Tiny is extracted)
M40 Yellow Medium Romeo Banana Yellow Banana Medium(all information extracted)
I want to extract that column so that I can do further processing of merging. Can anyone help me with this. Thank you in advance.
I'm struggling with next task: I would like to identify using pandas (or any other tool on python) if any of multiple cells (Fruit 1 through Fruit 3) in each row from Table 2 contains in column Fruits of Table1. And at the end obtain "Contains Fruits Table 2?" table.
Fruits
apple
orange
grape
melon
Name
Fruit 1
Fruit 2
Fruit 3
Contains Fruits Table 2?
Mike
apple
Yes
Bob
peach
pear
orange
Yes
Jack
banana
No
Rob
peach
banana
No
Rita
apple
orange
banana
Yes
Fruits in Table 2 can be up to 40 columns. Number of rows in Table1 is about 300.
I hope it is understandable, and someone can help me resolve this.
I really appreciate the support in advance!
Try:
filter DataFrame to include columns that contain the word "Fruit"
Use isin to check if the values are in table1["Fruits"]
Return True if any of fruits are found
map True/False to "Yes"/"No"
table2["Contains Fruits Table 2"] = table2.filter(like="Fruit")
.isin(table1["Fruits"].tolist())
.any(axis=1)
.map({True: "Yes", False: "No"})
>>> table2
Name Fruit 1 Fruit 2 Fruit 3 Contains Fruits Table 2
0 Mike apple None None Yes
1 Bob peach pear orange Yes
2 Jack banana None None No
3 Rob peach banana None No
4 Rita apple orange banana Yes
~~~
I have a list of strings looking like this:
strings = ['apple', 'pear', 'grapefruit']
and I have a data frame containing id and text values like this:
id
value
1
The grapefruit is delicious! But the pear tastes awful.
2
I am a big fan og apple products
3
The quick brown fox jumps over the lazy dog
4
An apple a day keeps the doctor away
Using pandas I would like to create a filter which will give me only the id and values for those rows, which contain one or more of the values together with a column, showing which values are contained in the string, like this:
id
value
value contains substrings:
1
The grapefruit is delicious! But the pear tastes awful.
grapefruit, pear
2
I am a big fan og apple products
apple
4
An apple a day keeps the doctor away
apple
How would I write this using pandas?
Use .str.findall:
df['fruits'] = df['value'].str.findall('|'.join(strings)).str.join(', ')
df[df.fruits != '']
id value fruits
0 1 The grapefruit is delicious! But the pear tast... grapefruit, pear
1 2 I am a big fan og apple products apple
3 4 An apple a day keeps the doctor away apple
I have a data frames where I am trying to find all possible combinations of itself and a fraction of itself. The following data frames is a much scaled down version of the one I am running. The first data frame (fruit1) is a fraction of the second data frame (fruit2).
FruitSubDF FruitFullDF
apple apple
cherry cherry
banana banana
peach
plum
By running the following code
df1 = pd.DataFrame(list(product(fruitDF.iloc[0:3,0], fruitDF.iloc[0:5,0])), columns=['fruit1', 'fruit2'])
the output is
Fruit1 Fruit2
0 apple banana
1 apple apple
2 apple cherry
3 apple peach
4 apple plum
5 cherry banana
6 cherry apple
7 cherry cherry
.
.
18 banana banana
19 banana peach
20 banana plum
My problem is I want to remove elements with the same two fruits regardless of which fruit is in which column as below. So I am considering (apple,cherry) and (cherry,apple) as the same but I am unsure of an efficient way instead of iterRows to weed out the unwanted data as most pandas functions I find will remove based on the order.
Fruit1 Fruit2
0 apple banana
1 apple cherry
2 apple apple
3 apple peach
4 apple plum
5 cherry banana
6 cherry cherry
.
.
15 banana plum
First, I created a piece of code to replicate your DataFrame. I took my code here :stack overflow
import pandas as pd
Fruit1=['apple', 'cherry', 'banana']
Fruit2=['banana', 'apple', 'cherry']
index = pd.MultiIndex.from_product([Fruit1, Fruit2], names = ["Fruit1", "Fruit2"])
df = pd.DataFrame(index = index).reset_index()
Then, you can use the lexicographial order to filter the dataframe.
df[df['Fruit1']<=df['Fruit2']]
I have the result you wanted to obtain.
EDIT : you edited your post but it seems to still do the job.
You can use itertools to achieve it -
import itertools
fruits = ['banana', 'cherry', 'apple']
pd.DataFrame((itertools.permutations(fruits, 2)), columns=['fruit1', 'fruit2'])