apply if else condition using dataframe - python

With below code I can see the data, there is one row and two columns.
I want to do a selection:
if both columns are 0 then do something
if both are greater than 0 then do something.
I am getting error in if condition. Can anyone please help me to this done?
Comment: OP post example dataset here or URL
from pyspark.sql import *
import pandas as pd
query = "(Select empID, empDept from employee)"
df1 = spark.read.jdbc(url=url, table=query, properties=properties)
df1.show()
if df1[empID]==0 && df1[empDept]==0:
print("less than zero")
elif df1[empID]>0 && df1[empDept]>0:
print("greather than 0")
else
print("do nothing")

There are multiple syntactical errors in your script. Try the below-modified code.
import numpy as np
if np.sum((df1["empID"]==0) & (df1["empDept"]==0)):
print("less than zero")
elif np.sum((df1["empID"]>0) & (df1["empDept"]>0)):
print("greather than 0")
else:
print("do nothing")
Please note that any comparison on data frame columns( like df1["empID"]==0 ) would return a series of boolean values, so have to handle them as a series not a regular variable.
df1:
empID empDept
0 1 1
Output:
greather than 0

You have some mistakes in your spelling:
replace the && with and
else: (the ':' is missing)
Try this:
import pandas as pd
import numpy as np
dat = np.array([[0, 0]])
df1 = pd.DataFrame(data=dat)
if df1.loc[0, 0]==0 and df1.loc[0, 1]==0:
print("less than zero")
elif df1.loc[0, 0]==0 and df1.loc[0, 1]>0:
print("greather than 0")
else:
print("do nothing")

Related

Search for string in a dataframe first 3 word

In this data frame, I have the start word "PRE" in the columns containing the note, so I should update yes to the new columns, otherwise no.
For whom I got this code but it is not working.
import pandas as pd
df1 = pd.DataFrame({'NOTES': ["PREPAID_HOME_SCREEN_MAMO", "SCREEN_MAMO",
"> Unable to connect internet>4G Compatible>Set",
"No>Not Barred>Active>No>Available>Others>",
"Internet Not Working>>>Unable To Connect To"]})
df1['NOTES'].astype(str)
for i in df1['NOTES']:
if i [:3]=='PRE':
df1['new']='yes'
else:
df1['new']='No'
df1
Set df1['new'] to a list using a list comprehension and ternary operator:
df1['new'] = ['yes' if i[:3] == 'PRE' else 'no' for i in df1['NOTES']
When setting dataframe columns, you need to set them to lists, not individual elements.
For case-insensitive:
df1['new'] = ['yes' if i[:3].upper() == 'PRE' else 'no' for i in df1['NOTES']
You can use list to apppend the values and then add value to dataframe.
Code -
import pandas as pd
df1 = pd.DataFrame({'NOTES': ["PREPAID_HOME_SCREEN_MAMO", "SCREEN_MAMO",
"> Unable to connect internet>4G Compatible>Set",
"No>Not Barred>Active>No>Available>Others>",
"Internet Not Working>>>Unable To Connect To"]})
df1['NOTES'].astype(str)
data = []
for i in df1['NOTES']:
if i[:3]=='PRE':
data.append('yes')
else:
data.append('no')
df1['new'] = data
The code that you posted will update all the 'new' column values with 'yes' or 'no' based on the condition. This happens because you do not already have a column 'new'.
Try the following :
import pandas as pd
df1 = pd.DataFrame({'NOTES': ...)
df1['NOTES'].astype(str)
new=['*' for i in range(len(df1['NOTES']))]
for i in range(len(df1['NOTES'])):
if df1['NOTES'][i][0:3]=="PRE":
new[i]='Yes'
else:
new[i]='No'
df1['new']=new

Trying to filter a CSV file with multiple variables using pandas in python

import pandas as pd
import numpy as np
df = pd.read_csv("adult.data.csv")
print("data shape: "+str(data.shape))
print("number of rows: "+str(data.shape[0]))
print("number of cols: "+str(data.shape[1]))
print(data.columns.values)
datahist = {}
for index, row in data.iterrows():
k = str(row['age']) + str(row['sex']) +
str(row['workclass']) + str(row['education']) +
str(row['marital-status']) + str(row['race'])
if k in datahist:
datahist[k] += 1
else:
datahist[k] = 1
uniquerows = 0
for key, value in datahist.items():
if value == 1:
uniquerows += 1
print(uniquerows)
for key, value in datahist.items():
if value == 1:
print(key)
df.loc[data['age'] == 58] & df.loc[data['sex'] == Male]
I have been trying to get the above code to work.
I have limited experience in coding but it seems like the issue lies with some of the columns being objects. The int64 columns work just fine when it comes to filtering.
Any assistance will be much appreciated!
df.loc[data['age'] == 58] & df.loc[data['sex'] == Male]
Firstly you are attemping to use Male variable, you probably meant string, i.e. it should be 'Male', secondly observe [ and ] placement, you are extracting part of DataFrame with age equal 58 then extracting part of DataFrame with sex equal Male and then try to use bitwise and. You should probably use & with conditions rather than pieces of DataFrame that is
df.loc[(data['age'] == 58) & (data['sex'] == 'Male')]
The int64 columns work just fine because you've specified the condition correctly as:
data['age'] == 58
However, the object column condition data['sex'] == Male should be specified as a string:
data['sex'] == 'Male'
Also, I noticed that you have loaded the dataframe df = pd.read_csv("adult.data.csv"). Do you mean this instead?
data = pd.read_csv("adult.data.csv")
The query at the end includes 2 conditions, and should be enclosed in brackets within the square brackets [ ] filter. If the dataframe name is data (instead of df), it should be:
data.loc[ (data['age'] == 58]) & (data['sex'] == Male) ]

Compare entire rows for equality if some condition is satisfied

Let's say I have the following data of a match in a CSV file:
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4
I'm writing a python program. Somewhere in my program I have scores collected for a match stored in a list, say x = [1,0,4]. I have found where in the data these scores exist using pandas and I can print "found" or "not found". However I want my code to print out to which name these scores correspond to. In this case the program should output "charlie" since charlie has all these values [1,0,4]. how can I do that?
I will have a large set of data so I must be able to tell which name corresponds to the numbers I pass to the program.
Yes, here's how to compare entire rows in a dataframe:
df[(df == x).all(axis=1)].index # where x is the pd.Series we're comparing to
Also, it makes life easiest if you directly set name as the index column when you read in the CSV.
import pandas as pd
from io import StringIO
df = """\
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4"""
df = pd.read_csv(StringIO(df), index_col='name')
x = pd.Series({'match1':1, 'match2':0, 'match3':4})
Now you can see that doing df == x, or equivalently df.eq(x), is not quite what you want because it does element-wise compare and returns a row of True/False. So you need to aggregate those rows with .all(axis=1) which finds rows where all comparison results were True...
df.eq(x).all(axis=1)
df[ (df == x).all(axis=1) ]
# match1 match2 match3
# name
# Charlie 1 0 4
...and then finally since you only want the name of such rows:
df[ (df == x).all(axis=1) ].index
# Index(['Charlie'], dtype='object', name='name')
df[ (df == x).all(axis=1) ].index.tolist()
# ['Charlie']
which is what you wanted. (I only added the spaces inside the expression for clarity).
You need to use DataFrame.loc which would work like this:
print(df.loc[(df.match1 == 1) & (df.match2 == 0) & (df.match3 == 4), 'name'])
Maybe try something like this:
import pandas as pd
import numpy as np
# Makes sample data
match1 = np.array([2,2,1])
match2 = np.array([4,4,0])
match3 = np.array([3,3,4])
name = np.array(['Alice','Bob','Charlie'])
df = pd.DataFrame({'name': id, 'match1': match1, 'match2':match2, 'match3' :match3})
df
# example of the list you want to get the data from
x=[1,0,4]
#x=[2,4,3]
# should return the name Charlie as well as the index (based on the values in the list x)
df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])]
# Makes a new dataframe out of the above
mydf = pd.DataFrame(df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])])
# Loop that prints out the name based on the index of mydf
# Assuming there are more than one name, it will print all. if there is only one name, it will print only that)
for i in range(0,len(mydf)):
print(mydf['name'].iloc[i])
you can use this
here data is your Data frame ,you can change accordingly your data frame name,
and
considering [1,0,4] is int type
data = data[(data['match1']== 1)&(data['match2']==0)&(data['match3']== 4 ).index
print(data[0])
if data is object type then use this
data = data[(data['match1']== "1")&(data['match2']=="0")&(data['match3']== "4" ).index
print(data[0])

Creating a New DataFrame Column and Filling it with an If Statement - Python

The desired outcome would be to have a new column with header 'xOver', whereby the values within xOver are determined by an if statement.
The values of xOver will either be: 1, 2, or NaN.
Value will be 1 if: data['Close'] > data['sma_5'] and data['Close'][-1] < data['sma_5']
Value will be NaN if that criteria is not satisfied.
Value will be 2 if another if and statement criteria is fulfilled (but for simplicity we can just ignore that for the purposes of solving this problem).
This is the data frame, which is called: data.
enter image description here
This is the code I have tried thus far:
import pandas as pd
import mplfinance as mpf
import numpy as np
data = pd.read_excel('SPY.xlsx', index_col=0, parse_dates=True)
#Create the moving avergae
data['sma_5'] = data['Close'].rolling(33).mean()
print(data)
def xOver(data):
if data['Close'] > data['sma_5'] and data['Close'][-1] < data['sma_5']:
return 1
else:
np.nan
data['xOver'] = xOver
print(data)
Which returns this:
enter image description here
SPY.xlsx
enter image description here
I have solved this problem, with the solution being:
data['xOver'] = np.where((data['Close'].shift(+2) < data['sma_5'].shift(+1))
& (data['Close'].shift(+1) > data['sma_5'].shift(+1))
& (data['Close'] > data['sma_5'].shift(+1)),
data['sma_5'],np.nan)

Loop with df.head()

Beginner
I want to add a loop for each time the input is yes it adds 5 to the number of df.head()
while True :
check = input(' Do You Wish to continue yes/no')
if check != 'yes' :
break
else :
print(df.head(5))
The meaning of df.head(5) is that it shows the first 5 rows of the dataframe.
It wont add any number of rows in a loop. You need to use a variable
I think you mean this program to work in the following manner :
import pandas as pd
df = pd.read_csv("train.csv")
i = 5
#df.shape[0] gives the number of rows
while(i< df.shape[0]):
check = input(' Do You Wish to continue yes/no: ')
if check == 'yes' :
print(df.head(i))
i+=5#increment 5
else :
#if input is not 'yes' end loop
break;

Categories

Resources