Loop with df.head() - python

Beginner
I want to add a loop for each time the input is yes it adds 5 to the number of df.head()
while True :
check = input(' Do You Wish to continue yes/no')
if check != 'yes' :
break
else :
print(df.head(5))

The meaning of df.head(5) is that it shows the first 5 rows of the dataframe.
It wont add any number of rows in a loop. You need to use a variable
I think you mean this program to work in the following manner :
import pandas as pd
df = pd.read_csv("train.csv")
i = 5
#df.shape[0] gives the number of rows
while(i< df.shape[0]):
check = input(' Do You Wish to continue yes/no: ')
if check == 'yes' :
print(df.head(i))
i+=5#increment 5
else :
#if input is not 'yes' end loop
break;

Related

Search for string in a dataframe first 3 word

In this data frame, I have the start word "PRE" in the columns containing the note, so I should update yes to the new columns, otherwise no.
For whom I got this code but it is not working.
import pandas as pd
df1 = pd.DataFrame({'NOTES': ["PREPAID_HOME_SCREEN_MAMO", "SCREEN_MAMO",
"> Unable to connect internet>4G Compatible>Set",
"No>Not Barred>Active>No>Available>Others>",
"Internet Not Working>>>Unable To Connect To"]})
df1['NOTES'].astype(str)
for i in df1['NOTES']:
if i [:3]=='PRE':
df1['new']='yes'
else:
df1['new']='No'
df1
Set df1['new'] to a list using a list comprehension and ternary operator:
df1['new'] = ['yes' if i[:3] == 'PRE' else 'no' for i in df1['NOTES']
When setting dataframe columns, you need to set them to lists, not individual elements.
For case-insensitive:
df1['new'] = ['yes' if i[:3].upper() == 'PRE' else 'no' for i in df1['NOTES']
You can use list to apppend the values and then add value to dataframe.
Code -
import pandas as pd
df1 = pd.DataFrame({'NOTES': ["PREPAID_HOME_SCREEN_MAMO", "SCREEN_MAMO",
"> Unable to connect internet>4G Compatible>Set",
"No>Not Barred>Active>No>Available>Others>",
"Internet Not Working>>>Unable To Connect To"]})
df1['NOTES'].astype(str)
data = []
for i in df1['NOTES']:
if i[:3]=='PRE':
data.append('yes')
else:
data.append('no')
df1['new'] = data
The code that you posted will update all the 'new' column values with 'yes' or 'no' based on the condition. This happens because you do not already have a column 'new'.
Try the following :
import pandas as pd
df1 = pd.DataFrame({'NOTES': ...)
df1['NOTES'].astype(str)
new=['*' for i in range(len(df1['NOTES']))]
for i in range(len(df1['NOTES'])):
if df1['NOTES'][i][0:3]=="PRE":
new[i]='Yes'
else:
new[i]='No'
df1['new']=new

Python: IF statement on the index of a dataframe?

Working through my first project in pandas and trying to build a simple program that retrieves the ID's and information from a large CSV.
I'm able to get it working when I print it. However when I try to make it a conditional statement it won't work. Is this because of the way I set the index?
The intent is to input an ID and in return list out all the rows with this ID in the CSV.
import numpy as np
import pandas as pd
file = PartIdsWithVehicleData.csv
excel = pd.read_csv(file, index_col = "ID", dtype={'ID': "str"})
UserInput = input("Enter a part number: ")
result = (excel.loc[UserInput])
#If I print this.... it will work
print(result)
#however when I try to make it a conditional statement it runs my else clause.
if UserInput in excel:
print(result)
else:
print("Part number is not in file.")
#one thing I did notice is that if I find the boolean value of the part number (index) it says false..
print(UserInput in excel)
Is this what you are trying to accomplish? I had to build my own table to help visualize the process, but you should be able to accomplish what you asked with a little tweeking to your own data
df = pd.DataFrame({
'Part_Number' : [1, 2, 3, 4, 5],
'Part_Name' : ['Name', 'Another Name', 'Third Name', 'Almost last name', 'last name']
})
user_input = int(input('Please enter part number'))
if user_input in df['Part_Number'].tolist():
print(df.loc[df['Part_Number'] == user_input])
else:
print('part number does not exist')

how do i find the occurrence and percentage of occurance of a word in a string ; how to fix error

basically i have this excel file that i uploaded to python, i made a new column which identified if a word was in each row and if it was in a row then it would come out as true, if not false. So i have this new column and im trying to find the percentage of true and false. Later i will try to make a table separating all the ones that are true and false. I need help with the percentage one first. I am a beginner like i started this last week
so for the percentage problem i decided to first create a code to count the occurrence of the word "true" and "false" in the column and then i would have just did some math to get the percentages but i didn't get past counting the occurrence. The product of the codes below were 0 and thats not what is suppose to display.
import pandas as pd
import xlrd
df = pd.read_excel (r'C:\New folder\CrohnsD.xlsx')
print (df)
df['has_word_icd'] = df.apply(lambda row: True if
row.str.contains('ICD').any() else False, axis=1)
print(df['has_word_icd'])
#df.to_excel(r'C:\New folder\done.xlsx')
test_str = "df['has_word_icd']"
counter = test_str.count('true')
print (str(counter))
this is the updated version and it still gives me 0, i cannot change df['has_word_icd'] because thats how the variable is introduced initially
import pandas as pd
import xlrd
df = pd.read_excel (r'C:\New folder\CrohnsD.xlsx')
print (df)
df['has_word_icd'] = df.apply(lambda row: True if
row.str.contains('ICD').any() else False, axis=1)
print(df['has_word_icd'])
#df.to_excel(r'C:\New folder\done.xlsx')
test_str = (df['has_word_icd'])
count = 0
for i in range(len(test_str)):
if test_str[i] == 'true':
count += 1
i += 1
print(count)
both gave me the same result
please help me, the output from both codes is "0" and it shouldn't be that. Somebody help me get a code that just directly gives me the percent of the "true" & "false"
Here is a way to do it using a list comprehension. For the percentage, you can use the np.mean() function:
import numpy as np
df= pd.DataFrame({'a' : ['hello icd', 'bob', 'bob icd', 'hello'],
'b' : ['bye', 'you', 'bob is icd better', 'bob is young']})
df['contains_word_icd'] = df.apply(lambda row :
any([True if 'icd' in row[x] else False for x in df.columns]), axis=1)
percentage = np.mean(df['contains_word_icd'])
# 0.5
Output :
a b contains_word_icd
0 hello icd bye True
1 bob you False
2 bob icd bob is icd better True
3 hello bob is young False
The main problem lies here: "df['has_word_icd']". You put a variable in quotes which to python means its a plain string. Correct would be
test_str = df[has_word_icd]
Then you loop through the test_str like so:
count = 0
for i in range(len(test_str)):
if test_str[i] == 'true':
count += 1
i += 1
print(count)
Then get the percentage:
percent = (count / range(len(df[has_word_icd]]) * 100

apply if else condition using dataframe

With below code I can see the data, there is one row and two columns.
I want to do a selection:
if both columns are 0 then do something
if both are greater than 0 then do something.
I am getting error in if condition. Can anyone please help me to this done?
Comment: OP post example dataset here or URL
from pyspark.sql import *
import pandas as pd
query = "(Select empID, empDept from employee)"
df1 = spark.read.jdbc(url=url, table=query, properties=properties)
df1.show()
if df1[empID]==0 && df1[empDept]==0:
print("less than zero")
elif df1[empID]>0 && df1[empDept]>0:
print("greather than 0")
else
print("do nothing")
There are multiple syntactical errors in your script. Try the below-modified code.
import numpy as np
if np.sum((df1["empID"]==0) & (df1["empDept"]==0)):
print("less than zero")
elif np.sum((df1["empID"]>0) & (df1["empDept"]>0)):
print("greather than 0")
else:
print("do nothing")
Please note that any comparison on data frame columns( like df1["empID"]==0 ) would return a series of boolean values, so have to handle them as a series not a regular variable.
df1:
empID empDept
0 1 1
Output:
greather than 0
You have some mistakes in your spelling:
replace the && with and
else: (the ':' is missing)
Try this:
import pandas as pd
import numpy as np
dat = np.array([[0, 0]])
df1 = pd.DataFrame(data=dat)
if df1.loc[0, 0]==0 and df1.loc[0, 1]==0:
print("less than zero")
elif df1.loc[0, 0]==0 and df1.loc[0, 1]>0:
print("greather than 0")
else:
print("do nothing")

invalid type comparison with Booleans in Pandas

Trying to clean Country (Ctry) column in pandas dataframe (origin) based on other row level data, or other dataframes with similar data. See links for example data frames.
It will eventually feed two new columns in the dataframe giving correctly formatted country and a data quality "score".
Origin Dataframe
Nafta, Countries, and States DataFrames
The function works on values that are in the lookup tables, or blanks, but when I pass "bad" data in, it gives a invalid type comparison. Testing this separately returns a boolean and works:
Nafta.loc[Nafta[col] == a].empty .
Not sure why this doesn't work. I've tested the values, and its Boolean to Boolan. See custom function and lambda.
def CountryScore(a,b,c):
if pd.isnull(a):
score = "blank"
if pd.notnull(b):
for col in States:
if States.loc[States[col]== b].empty != True:
corfor = States.iloc[States.loc[States[col] == b].index[-1],2]
break
else:
corfor = "Bad Data"
continue
elif pd.notnull(c):
if (len(str(c).strip()) <= 5) or (len(str(c).strip()) > 9):
corfor = "USA"
else:
corfor = "CAN"
else:
corfor = "Bad Data"
else:
for col in Nafta:
if Nafta.loc[Nafta[col] == a].empty != True:
score = "good"
corfor = Nafta.iloc[Nafta.loc[Nafta[col] == a].index[-1],1]
break
else:
score = "pending"
continue
if "pending" == score:
for col in Country:
if Country.loc[Country[col]== a].empty != True:
score = "good"
corfor = Country.iloc[Country.loc[Country[col] == a].index[-1],2]
break
else:
score = "bad"
corfor = "Bad Data"
continue
return score, corfor
origin["Origin Ctry Score"] , origin["Origin Ctry Format"] = zip(*origin.apply(lambda x: CountryScore(x["Origin Ctry"], x["Origin State"], x["Origin Zip"]), axis = 1))
Assume dataframes are loaded already. Thanks!!!
I was able to find my mistake. In the last column of Country, i compare a integer to string. Had nothing to do with Boolean. Fixed with:
Country.loc[Country[col].astype(str)== a].empty != True
I will end up wrapping most in this type of transformation.

Categories

Resources