First off, I am still new to Python and have searched and have been unable to find out anywhere how to do this (from a new person's perspective)...
I have a python
I need to print out the index, column name and value.
Let's say I have the following dataframe
EAT DAILY WEEKLY YEARLY
Fruit
APPLE 2 5 200
ORANGE 1 3 100
BANANA 1 4 150
PEAR 0 1 40
I need to print it our such that I would get something like the following so that it iterates over every row in the dataframe.
Eat Apple Daily at least 2
Eat Apple Weekly at least 5
Eat Apple Yearly at least 200
Eat Orange Daily at least 1
Eat Orange Weekly at least 3
Eat Orange Yearly at least 100
..
...
....
I have tried various combinations but am still learning so any help is appreciated.
So far I have tried
for row in test.iterrows():
index, data = row
print index , (data['column1'])
print index , (data['column2'])
print index , (data['column3'])
Which will give me the index and value but not the column plus I'd like it to be able to iterate regardless how many columns or rows were used. Also, I still need to be able to insert the text which needs to be dynamic...
Series of strings
f = 'Eat {Fruit} {EAT} at least {value}'.format
df.stack().reset_index(name='value').apply(lambda x: f(**x), 1)
0 Eat APPLE DAILY at least 2
1 Eat APPLE WEEKLY at least 5
2 Eat APPLE YEARLY at least 200
3 Eat ORANGE DAILY at least 1
4 Eat ORANGE WEEKLY at least 3
5 Eat ORANGE YEARLY at least 100
6 Eat BANANA DAILY at least 1
7 Eat BANANA WEEKLY at least 4
8 Eat BANANA YEARLY at least 150
9 Eat PEAR DAILY at least 0
10 Eat PEAR WEEKLY at least 1
11 Eat PEAR YEARLY at least 40
dtype: object
print out
for idx, value in df.stack().iteritems():
print('Eat {0[0]} {0[1]} at least {1}'.format(idx, value))
Eat APPLE DAILY at least 2
Eat APPLE WEEKLY at least 5
Eat APPLE YEARLY at least 200
Eat ORANGE DAILY at least 1
Eat ORANGE WEEKLY at least 3
Eat ORANGE YEARLY at least 100
Eat BANANA DAILY at least 1
Eat BANANA WEEKLY at least 4
Eat BANANA YEARLY at least 150
Eat PEAR DAILY at least 0
Eat PEAR WEEKLY at least 1
Eat PEAR YEARLY at least 40
You can use stack for reshape to Series with MultiIndex and then iterate by Series.iteritems with format:
test = test.stack()
print (test)
Fruit EAT
APPLE DAILY 2
WEEKLY 5
YEARLY 200
ORANGE DAILY 1
WEEKLY 3
YEARLY 100
BANANA DAILY 1
WEEKLY 4
YEARLY 150
PEAR DAILY 0
WEEKLY 1
YEARLY 40
dtype: int64
for index, data in test.iteritems():
print (('Eat {} {} at least {}').format(index[0], index[1], data))
Eat APPLE DAILY at least 2
Eat APPLE WEEKLY at least 5
Eat APPLE YEARLY at least 200
Eat ORANGE DAILY at least 1
Eat ORANGE WEEKLY at least 3
Eat ORANGE YEARLY at least 100
Eat BANANA DAILY at least 1
Eat BANANA WEEKLY at least 4
Eat BANANA YEARLY at least 150
Eat PEAR DAILY at least 0
Eat PEAR WEEKLY at least 1
Eat PEAR YEARLY at least 40
But if really need DataFrame add reset_indexand then loop by DataFrame.iterrows:
test = test.stack().reset_index(name='VAL')
print (test)
Fruit EAT VAL
0 APPLE DAILY 2
1 APPLE WEEKLY 5
2 APPLE YEARLY 200
3 ORANGE DAILY 1
4 ORANGE WEEKLY 3
5 ORANGE YEARLY 100
6 BANANA DAILY 1
7 BANANA WEEKLY 4
8 BANANA YEARLY 150
9 PEAR DAILY 0
10 PEAR WEEKLY 1
11 PEAR YEARLY 40
for index, data in test.iterrows():
print (('Eat {} {} at least {}').format(data['Fruit'], data['EAT'], data['VAL']))
Eat APPLE DAILY at least 2
Eat APPLE WEEKLY at least 5
Eat APPLE YEARLY at least 200
Eat ORANGE DAILY at least 1
Eat ORANGE WEEKLY at least 3
Eat ORANGE YEARLY at least 100
Eat BANANA DAILY at least 1
Eat BANANA WEEKLY at least 4
Eat BANANA YEARLY at least 150
Eat PEAR DAILY at least 0
Eat PEAR WEEKLY at least 1
Eat PEAR YEARLY at least 40
Consider even a non-loop solution using pandas.DataFrame.to_string:
sdf = df.stack().reset_index(name='VALUE')
sdf['Output'] = sdf.apply(lambda row: "EAT {} {} at least {}".\
format(row['Fruit'], row['EAT'], row['VALUE']), axis=1)
# PRINT TO CONSOLE
print(sdf[['Output']].to_string(header=False, index=False, justify='left'))
# WRITE TO TEXT
with open('Output.txt', 'w') as f:
f.write(sdf[['Output']].to_string(header=False, index=False, justify='left'))
# EAT APPLE DAILY at least 2
# EAT APPLE WEEKLY at least 5
# EAT APPLE YEARLY at least 200
# EAT ORANGE DAILY at least 1
# EAT ORANGE WEEKLY at least 3
# EAT ORANGE YEARLY at least 100
# EAT BANANA DAILY at least 1
# EAT BANANA WEEKLY at least 4
# EAT BANANA YEARLY at least 150
# EAT PEAR DAILY at least 0
# EAT PEAR WEEKLY at least 1
# EAT PEAR YEARLY at least 40
You will notice a justification issue currently a reported bug on the method. Of course you can remedy with string handling (strip(), replace()) in general, base Python.
Related
I have a dataframe with information, where the rows are not related to eachother:
Fruits Vegetables Protein
1 Apple Spinach Beef
2 Banana Cucumber Chicken
3 Pear Carrot Pork
I essentially just want to create a pandas series with all of that information, I want it to look like this:
All Foods
1 Apple
2 Banana
3 Pear
4 Spinach
5 Cucumber
6 Carrot
7 Beef
8 Chicken
9 Pork
How can I do this in pandas?
Dump into numpy and create a new dataframe:
out = df.to_numpy().ravel(order='F')
pd.DataFrame({'All Foods' : out})
All Foods
0 Apple
1 Banana
2 Pear
3 Spinach
4 Cucumber
5 Carrot
6 Beef
7 Chicken
8 Pork
Just pd.concat them together (and reset the index).
all_foods = pd.concat([foods[col] for col in foods.columns])
You can unstack the dataframe to get the values and then create a df/series:
df = pd.DataFrame({'Fruits':['Apple','Banana', 'Pear'], 'Vegetables':['Spinach', 'Carrot', 'Cucumber'], 'Protein':['Beef', 'Chicken', 'Pork']})
pd.DataFrame({'All Foods' : df.unstack().values})
This should help:
import pandas as pd
# Loading file with fruits, vegetables and protein
dataset = pd.read_csv('/fruit.csv')
# This is where you should apply your code
# Unpivoting (creating one column out of 3 columns)
df_unpivot = pd.melt(dataset, value_vars=['Fruits', 'Vegetables', 'Protein'])
# Renaming column from value to All Foods
df_finalized = df_unpivot.rename(columns={'value': 'All Foods'})
# Printing out "All Foods" column
print(df_finalized["All Foods"])
Input Data:
sn
fruits
Quality
Date
1
Apple
A
2022-09-01
2
Apple
A
2022-08-15
3
Apple
A
2022-07-15
4
Apple
B
2022-06-01
5
Apple
A
2022-05-15
6
Apple
A
2022-04-15
7
Banana
A
2022-08-15
8
Orange
A
2022-08-15
Get the average date diff for each type of fruit, only if quality=A and there are consecutive record with quality A.
If there are three rows of A quality only first 2 make valid pair. Third one is not valid pair as 4th record is quality=B
So in above data we have 2 valid pairs for Apple 1st pair= (1,2) = 15days date diff and 2nd pair = (5,6) = 15days diff so avg for apple is 15days
Expected output
fruits
avg time diff
Apple
15 days
Banana
null
Orange
null
How can I do this without using any looping in pandas dataframe?
I'm struggling with next task: I would like to identify using pandas (or any other tool on python) if any of multiple cells (Fruit 1 through Fruit 3) in each row from Table 2 contains in column Fruits of Table1. And at the end obtain "Contains Fruits Table 2?" table.
Fruits
apple
orange
grape
melon
Name
Fruit 1
Fruit 2
Fruit 3
Contains Fruits Table 2?
Mike
apple
Yes
Bob
peach
pear
orange
Yes
Jack
banana
No
Rob
peach
banana
No
Rita
apple
orange
banana
Yes
Fruits in Table 2 can be up to 40 columns. Number of rows in Table1 is about 300.
I hope it is understandable, and someone can help me resolve this.
I really appreciate the support in advance!
Try:
filter DataFrame to include columns that contain the word "Fruit"
Use isin to check if the values are in table1["Fruits"]
Return True if any of fruits are found
map True/False to "Yes"/"No"
table2["Contains Fruits Table 2"] = table2.filter(like="Fruit")
.isin(table1["Fruits"].tolist())
.any(axis=1)
.map({True: "Yes", False: "No"})
>>> table2
Name Fruit 1 Fruit 2 Fruit 3 Contains Fruits Table 2
0 Mike apple None None Yes
1 Bob peach pear orange Yes
2 Jack banana None None No
3 Rob peach banana None No
4 Rita apple orange banana Yes
~~~
I have a list of strings looking like this:
strings = ['apple', 'pear', 'grapefruit']
and I have a data frame containing id and text values like this:
id
value
1
The grapefruit is delicious! But the pear tastes awful.
2
I am a big fan og apple products
3
The quick brown fox jumps over the lazy dog
4
An apple a day keeps the doctor away
Using pandas I would like to create a filter which will give me only the id and values for those rows, which contain one or more of the values together with a column, showing which values are contained in the string, like this:
id
value
value contains substrings:
1
The grapefruit is delicious! But the pear tastes awful.
grapefruit, pear
2
I am a big fan og apple products
apple
4
An apple a day keeps the doctor away
apple
How would I write this using pandas?
Use .str.findall:
df['fruits'] = df['value'].str.findall('|'.join(strings)).str.join(', ')
df[df.fruits != '']
id value fruits
0 1 The grapefruit is delicious! But the pear tast... grapefruit, pear
1 2 I am a big fan og apple products apple
3 4 An apple a day keeps the doctor away apple
I have a csv that i want to count how many rows that match specific columns, what would be the best way to do this? So for example if this was the csv:
fruit days characteristic1 characteristic2
0 apple 1 red sweet
1 orange 2 round sweet
2 pineapple 5 prickly sweet
3 apple 4 yellow sweet
the output i would want would be
1 apple: red,sweet
A csv is a file with values that are separated by commas. I would recommend turning this into a .txt file and using this same format. Then establish consistent spacing throughout your file (using tab for example). So that when you loop through each line it knows where the actual information is. Then when you what info is in what column you print those specific values.
# Use a tab in between each column
fruit days charac1 charac2
0 apple1 1 red sweet
1 orange 2 round sweet
2 pineapple 5 prickly sweet
3 apple 4 yellow sweet
This is just to get you started.