Using two if statements while using excel as my data set - python

I am trying to loop through and count how many times 'Dan' has the color 'green'
my sheet looks like this
https://imgur.com/a/ccohMbD
import xlrd
import os
#os.chdir('C:\Users\Dan\Desktop')
cwd = os.getcwd()
excelsheet1 ='Python practice book.xlsx'
book1 = xlrd.open_workbook(excelsheet1)
sheet = book1.sheet_by_index(0)
i=0
for row in range(sheet.nrows):
if str(sheet.cell(row,0).value) == "Dan" and 'green':
i=i+1
else:
pass
print('there are', i)
In this it tells me 2, Which I understand is because python is only looping for Dan and not taking into account my and function.
I have tried duplicating my if function using code:
if str(sheet.cell(row,0).value) == "Dan":
if str(sheet.cell(row,0).value) == "green":
i=i+1
else:
pass
print('there are', i)
which python returns 0
I think its related to how I am formatting my call

As Klaus commented, "and" doesn't work like that. In the case of a non empty string, it will always return True.
The example below will return your expected results. I set the "name" and "color" variables to enhance readability.
Note that your second code example would have worked fine, except that you didn't change the column index between the first "if" and the second.
import xlrd
import os
#os.chdir('C:\Users\Dan\Desktop')
cwd = os.getcwd()
excelsheet1 ='Python practice book.xlsx'
book1 = xlrd.open_workbook(excelsheet1)
sheet = book1.sheet_by_index(0)
i=0
for row in range(sheet.nrows):
name = str(sheet.cell(row,0).value)
color = str(sheet.cell(row,2).value)
if name == "Dan" and color == "green":
i=i+1
# There's no need for an "else" statement here.
print('there are', i)

Related

Using Try Except to iterate through a list in Python

I'm trying to iterate through a list of NFL QBs (over 100) and add create a list of links that I will use later.
The links follow a standard format, however if there are multiple players with the same name (such as 'Josh Allen') the link format needs to change.
I've been trying to do this with different nested while/for loops with Try/Except with little to no success. This is what I have so far:
test = ['Josh Allen', 'Lamar Jackson', 'Derek Carr']
empty_list=[]
name_int = 0
for names in test:
try:
q_b_name = names.split()
link1=q_b_name[1][0].capitalize()
link2=q_b_name[1][0:4].capitalize()+q_b_name[0][0:2].capitalize()+f'0{name_int}'
q_b = pd.read_html(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
q_b1 = q_b[0]
#filter_status is a function that only works with QB data
df = filter_stats(q_b1)
#triggers the try if the link wasn't a QB
df.head(5)
empty_list.append(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
except:
#adds one to the variable to change the link to find the proper QB link
name_int += 1
The result only appends the final correct link. I need to append each correct link to the empty list.
Still a beginner in Python and trying to challenge myself with different projects. Thanks!
As stated, the try/except will work in that it will try the code under the try block. If at any point within that block it fails or raises and exception/error, it goes and executes the block of code under the except.
There are better ways to go about this problem (for example, I'd use BeautifulSoup to simply check the html for the "QB" position), but since you are a beginner, I think trying to learn this process will help you understand the loops.
So what this code does:
1 It formats your player name into the link format.
2 We initialize a while loop that will it will enter
3 It gets the table.
4a) It enters a function that checks if the table contains 'passing'
stats by looking at the column headers.
4b) If it finds 'passing' in the column, it will return a True statement to indicate it is a "QB" type of table (keep in mind sometimes there might be runningbacks or other positions who have passing stats, but we'll ignore that). If it returns True, the while loop will stop and go to the next name in your test list
4c) If it returns False, it'll increment your name_int and check the next one
5 To take care of a case where it never finds a QB table, the while loop will go to False if it tries 10 iterations
Code:
import pandas as pd
def check_stats(q_b1):
for col in q_b1.columns:
if 'passing' in col.lower():
return True
return False
test = ['Josh Allen', 'Lamar Jackson', 'Derek Carr']
empty_list=[]
for names in test:
name_int = 0
q_b_name = names.split()
link1=q_b_name[1][0].capitalize()
qbStatsInTable = False
while qbStatsInTable == False:
link2=q_b_name[1][0:4].capitalize()+q_b_name[0][0:2].capitalize()+f'0{name_int}'
url = f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/'
try:
q_b = pd.read_html(url, header=0)
q_b1 = q_b[0]
except Exception as e:
print(e)
break
#Check if "passing" in the table columns
qbStatsInTable = check_stats(q_b1)
if qbStatsInTable == True:
print(f'{names} - Found QB Stats in {link1}/{link2}/gamelog/')
empty_list.append(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
else:
name_int += 1
if name_int == 10:
print(f'Did not find a link for {names}')
qbStatsInTable = False
Output:
print(empty_list)
['https://www.pro-football-reference.com/players/A/AlleJo02/gamelog/', 'https://www.pro-football-reference.com/players/J/JackLa00/gamelog/', 'https://www.pro-football-reference.com/players/C/CarrDe02/gamelog/']

Converting STR to INT on Dataframe doesn't work on the specific parts

I know, this is an easy question, but I checked so many sites on the internet and couldn't find the problem that I have.
I have a dataframe and one column of this dataframe is for brand. I wanted to give specific numbers for these brands to make brand aggregation easier.
import pandas as pd
last = pd.read_pickle('pre_clustering.pkl')
random_number=9288
first=""
f=0
for i in last['brand']:
if(type(i)==str):
if(first == i):
last.at[f, 'brand']= random_number
print(last.loc[f, 'brand'])
f=f+1
elif(first !=i):
first=i
random_number= random_number +1
last.at[f, 'brand'] = random_number
print(last.loc[f, 'brand'])
f=f+1
else:
f=f+1
brand = last['brand']
This is my code and output.
I tried everthing to convert them to integer, but they are still string. I controlled my if else condition to be sure by using print() and it is working as you see
What is wrong with my code? or what should I do to convert my strings to integers?
In your code, you use a sequence of f as an index of rows in last, but last is sorted on brand, therefore the sequence of f is not the index of row. as a result, you put the random number in the wrong places and leave others.
In order to correct code, we use last.iterrows() in for loop as follows:
for f, row in last.iterrows():
i=row['brans']
where f will be the index of the row you dealing with, so you do not need f=f+1.
and i holds the brand in the row you deal with.
Finally, I added some declaration as (comment) with modification of your code:
import pandas as pd
last = pd.read_pickle('pre_clustering.pkl')
random_number=9288
first=""
# f=0 (No need)
for f, row in last.iterrows(): # for i in last['brand']: (Changed: f is the actual row index)
i=row['brand'] # (added)
if(type(i)==str):
if(first == i):
last.at[f, 'brand']= random_number
print(last.loc[f, 'brand'])
# f=f+1 (No need)
elif (first !=i):
first=i
random_number= random_number +1
last.at[f, 'brand'] = random_number
print(last.loc[f, 'brand'])
# f=f+1
#else:
# f=f+1
brand = last['brand']
Do your best :)
Did you try typecasting ? with the use of as.type('int') . More details here : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html

Try Statement Within for Loop not working python

I have the script below, what I am finding that if my Try statement is satisfied, the whole iteration of the first loop will end and it will just go into the next loop iteration.
What I want to happen is it the, if statement, is satisfied, it then just back to the elif within the for loop that it sits within.
So essentially I want to go
For loop.
Then try if something works (I know some wont)
Then go back into a for loop (and all the subsequent loops will still work.
Ideally I would structure this as: for loop- try statement- for loop, but essentially I am asking the same question as when that happens to, the loop also breaks out.
Thaankyou, Sam
# LOOP OVER THE DICTIONARIES (CSV LINE DATA) AND PULL UP THE PARAMETER KEY TO MATCH
tester = []
updatefamilyname = IN[6]
typename_input01_value = IN[7]
typename_input02_value = IN[8]
for datarow_dict in Data_DictionaryList:
# Get matching value from data row dictionary
Matching_Value = str(datarow_dict[parameter_key_tomatch])
#test.append(parameter_key_tomatch)
#test.append(Matching_Value)
try:
Get_Matching_Value_LongIndex = Base_MatchingWalls_Dict[Matching_Value]
Split_Index = Get_Matching_Value_LongIndex.split(indexplacesplit)[1]
MatchedWall = Base_MatchingWalls[int(Split_Index)]
#test.append(MatchedWall)
#test.append("here2")
for para_key, para_value in datarow_dict.items():
#ResultOutput.append(para_key)
#ResultOutput.append(typename_input01_value)
#test.append(para_key)
#test.append(para_value)
# We then say if the paramter key is the same as the matching key, open the following
if para_key == parameter_key_tomatch:
#test.append("inside")
ResultOutput.append(para_key + " equal to " + para_value)
# Set New Family Type Name Value
#if para_key == typename_input01_key:
#typename_input01_value = para_value
#tester.append("inside link")
#elif para_key == typename_input02_key:
#typename_input02_value = para_value
#else:
#print ("go")
elif para_key != parameter_key_tomatch:
ResultOutput.append(para_key)
print ("not here")
#ResultOutput.append(para_key)
#TestParameter_ = testparavalue(MatchedWall, para_key, para_value)
#else:
#pass
# We then say, if its not the same, this is when we want to do some things, could be where we change the family name if we change the name of the name and number
else:
print ("ouside exception")
#ResultOutput.append("else why")
if updatefamilyname == "Yes":
ResultOutput.append("update name accepted")
print ("yes")
# THIS IS WHERE WE COULD PASS IN THE OTHER ONE TO MATCH TO FORM THE CHANGE IN FAMILY TYPE NAME
#typename_input01_value
#typename_input01_value
# This is where we bring in
# Anything where matching para does not have a corresponding wall type
except:
print ("here")
ResultOutput.append(Matching_Value)

Create Excel file from Python

My project is to treat different Excel files. To do this, I would like to create a single file that contains some data of the previous files. All this in order to have my database. The goal is to obtain graphs of these data. All of this automatically.
I wrote this program in Python. However, it takes 20 minutes to run it. How can I optimize it?
In addition, I have identical variables in some files. So I would like that in the final file, the identical variables are not repeated. How to do?
Here is my program :
import os
import xlrd
import xlsxwriter
from xlrd import open_workbook
wc = xlrd.open_workbook("U:\\INSEE\\table-appartenance-geo-communes-16.xls")
sheet0=wc.sheet_by_index(0)
# création
with xlsxwriter.Workbook('U:\\INSEE\\Department61.xlsx') as bdd:
dept61 = bdd.add_worksheet('deprt61')
folder_path = "U:\\INSEE\\2013_telechargement2016"
col=8
constante3=0
lastCol=0
listeV = list()
for path, dirs, files in os.walk(folder_path):
for filename in files:
filename = os.path.join(path, filename)
wb = xlrd.open_workbook(filename, '.xls')
sheet1 = wb.sheet_by_index(0)
lastRow=sheet1.nrows
lastCol=sheet1.ncols
colDep=None
firstRow=None
for ligne in range(0,lastRow):
for col2 in range(0,lastCol):
if sheet1.cell_value(ligne, col2) == 'DEP':
colDep=col2
firstRow=ligne
break
if colDep is not None:
break
col=col-colDep-2-constante3
constante3=0
for nCol in range(colDep+2,lastCol):
constante=1
for ligne in range(firstRow,lastRow):
if sheet1.cell(ligne, colDep).value=='61':
Q=(sheet1.cell(firstRow, nCol).value in listeV)
if Q==False:
V=sheet1.cell(firstRow, nCol).value
listeV.append(V)
dept61.write(0,col+nCol,sheet1.cell(firstRow, nCol).value)
for ligne in range(ligne,lastRow):
if sheet1.cell(ligne, colDep).value=='61':
dept61.write(constante,col+nCol,sheet1.cell(ligne, nCol).value)
constante=constante+1
elif Q==True:
constante3=constante3+1 # I have a problem here. I would like to count the number of variables that already exists but I find huge numbers.
break
col=col+lastCol
bdd.close()
Thanks you for your future help. :)
This one may be too broad for SO, so here are some pointers for where you can optimise. Maybe add a sample screenshot of what the sheets look like.
Wrt if sheet1.cell_value(ligne, col2) == 'DEP': Can 'DEP' occur multiple times in a sheet? If it will definitely occur only once and that's when you get your values for both colDep and firstRow, then break out of both loops. Add break out of both loops, by adding a break to end the inner loop, then check for a flag value and break out of the outer loop before iterating over it. Like so:
colDep = None # initialise to None
firstRow = None # initialise to None
for ligne in range(0,lastRow):
for col2 in range(0,lastCol):
if sheet1.cell_value(ligne, col2) == 'DEP':
colDep=col2
firstRow=ligne
break # break out of the `col2 in range(0,lastCol)` loop
if colDep is not None: # or just `if colDep:` if colDep will never be 0.
break # break out of the `ligne in range(0,lastRow)` loop
I think the range in for ligne in range(0,lastRow): in your write-to-bdd blocks should start at firstRow since you know that 0 to firstRow-1 will be empty in sheet1 which you've just read to look for the header.
for ligne in range(firstRow, lastRow):
That will avoid wasting time reading empty header rows.
Other considerations for cleaner code:
Use the with xlsxwriter.Workbook('U:\INSEE\\Department61.xlsx') as bdd: syntax for clarity.
and always use double slashes \\ inside strings even if not preceding a control character: 'U:\\INSEE\\Department61.xlsx'
You've used sheet1.cell_value() as well as sheet1.cell().value for your read operations. Pick one, unless you needed extended Cell info in the value=='61' case.
Read PEP-8 for how to write more readable code.

Python syntax error with arcpy UpdateCursor

I am new to python and would like to have a script that looks at a feature class and compares the values in two text fields and then populates a third field with a Y or N depending on if the values are the same or not. I think I need to use an UpdateCursor with an if statement. I have tried the following but I get a syntax error when I try to run it. I am using ArcGIS 10.1 and know that the daCursor is better but I am just trying to wrap my head around cursors and thought I would try and keep it simple for now.
#import system modules
import arcpy
from arcpy import env
import os
import sys
#set environment settings
working_fc = sys.argv[1]
working_gdb = os.path.split(working_fc)[0]
#use an update cursor to populate the field BEC_UPDATED based on the result of a query
#query = ("SELECT * FROM working_fc" "WHERE [BEC_LABEL] = [BEC_V9]")
#if the query is true, then BEC_UPDATED should be popluated with "N"
#if the query is false, then BEC_UPDATED should be populated with "Y"
rows = arcpy.UpdateCursor (working_fc)
for row in rows:
if row.getValue("BEC_LABEL") == row.getValue("BEC_V9")
row.BEC_UPDATED = "N"
else
row.BEC_UPDATED = "Y"
rows.updateRow(row)
print "BEC_UPDATED field populated"
Your syntax error is caused by indentation and missing colons. Python is picky about that, so always check that when you're getting a syntax error.
rows = arcpy.UpdateCursor(working_fc)
for row in rows:
if row.getValue("BEC_LABEL") == row.getValue("BEC_V9"):
row.BEC_UPDATED = "N"
else:
row.BEC_UPDATED = "Y"
rows.updateRow(row)
Changing this to the da.UpdateCursor syntax is essentially the same, but requires you to specify the attributes you are interested in up front. It's worth practicing, because once you get into more complex scripts it will become easier :)
fieldList = ["BEC_LABEL", "BEC_V9", "BEC_UPDATED"]
with arcpy.da.UpdateCursor(working_fc, fieldList) as cursor:
for row in cursor:
if row[0] == row[1]:
row[2] = "N"
else:
row[2] = "Y"
cursor.updateRow(row)
You've forgotten the colons and indentation in your 'if' block:
rows = arcpy.UpdateCursor (working_fc)
for row in rows:
if row.getValue("BEC_LABEL") == row.getValue("BEC_V9"):
row.BEC_UPDATED = "N"
else:
row.BEC_UPDATED = "Y"
rows.updateRow(row)
print "BEC_UPDATED field populated"

Categories

Resources