Error after elif statement is added to df.iterrows() loop

Error after elif statement is added to df.iterrows() loop - python

So I was not obtaining any syntax errors with the first if statement. When I add the second elif, i get a syntax error for the next line:
NPI = df["NPI2"]
^
SyntaxError: invalid syntax
Not sure why this is happening since the first if statement is essentially the same as the elif
Here's my code:
for i, row in df.iterrows():
NPI2 = row["NPI2"]
if row["CapDesignation"] == "R" and row["BR"] >= row["CapThreshold"]:
NPI2 = row["CapThreshold"]*row['KBETR']-.005
df.at[i,"NPI2"] = NPI2
df.at[i,"BR"] = NPI2/row["KBETR"]
elif row["CapDesignation"] == "N" and row["BN"] >= row["CapThreshold"]:
NPI2 = row["CapThreshold"]*(row["KBETR"]-row["P"])-.005
df.at[i,"NPI2"] = NPI2
df.at[i,"BN"] = NPI2/((row["KBETR"]-row["P"])
NPI = df["NPI2"]
df["KBETR_N"] = round(R + Delta_P + NPI,2)

Not sure where the error is without the traceback message, but it should be:
df.at[i,"BN"] = NPI2/((row["KBETR"]-row["P"]) where you have 2 parentheses in ((row["KBETR"] instead of 1(if you wish to have row["KBETR"]-row["P"] surrounded by parentheses).

Related

OpenPyXL - How to query cell borders?

New to both python and openpyxl.
Writing a py script to glom through a ton of Excel workbooks/sheets, and need to find certain cells identified by their border formatting.
I see several examples online of how to set cell borders, but I need to read them.
Specifically, I wish to identify table boundaries, when the data within the table is inconsistent, but the table borders are always present. So, I need to find identify the cells with:
* top / left borders
* top / right borders
* bottom / left borders
* bottom / right borders
(thin borders). There is only one such table per worksheet.
Could some kind maven point me to a code sample? I would provide my code thus far, but honestly I have no idea how to begin. My code for looping through each worksheet is:
for row in range(1, ws.max_row, 1):
for col in range(1, sheet.max_column+1):
tmp = NumToAlpha(col)
ref = str(tmp) + str(row)
hasTopBorder = ws[ref].?????? <=== how do I get a boolean here?
hasLeftBorder = ws[ref].?????? <=== how do I get a boolean here?
hasRightBorder = ws[ref].?????? <=== how do I get a boolean here?
hasBottomBorder = ws[ref].?????? <=== how do I get a boolean here?
if hasTopBorder==True and hasLeftBorder==True and hasRightBorder==False and hasBottomBorder==False:
tableTopLeftCell = tmp + str(row)
elif hasTopBorder==True and hasLeftBorder==False and hasRightBorder==True and hasBottomBorder==False:
tableTopRightCell = tmp + str(row)
elif hasTopBorder==False and hasLeftBorder==True and hasRightBorder==False and hasBottomBorder==True:
tableBottomLeftCell = tmp + str(row)
elif hasTopBorder==False and hasLeftBorder==False and hasRightBorder==True and hasBottomBorder==True:
tableBottomRightCell = tmp + str(row)
if tableTopLeftCell != "" and tableTopRightCell != "" and tableBottomLeftCell != "" and tableBottomRightCell != "": break
if tableTopLeftCell != "" and tableTopRightCell != "" and tableBottomLeftCell != "" and tableBottomRightCell != "": break
Comments/suggestions for streamlining this novice code welcome and gratefully received.
Update:
By querying a cell like this:
tst = sheet['Q17'].border
I see that I get this type of result - but how do I use it? Or convert it into the desired boolean?

Here's one way.
I used is not none because the borders could be thin, double, etc.
for row in range(1, ws.max_row, 1):
for col in range(1, ws.max_column+1):
tmp = NumToAlpha(col)
cellRef = str(tmp) + str(row)
cellBorders = getCellBorders(ws, cellRef)
if ('T' in cellBorders) or ('L' in cellBorders) or ('R' in cellBorders) or ('B' in cellBorders):
if 'myTableTopLeftCell' not in refs:
if ('T' in cellBorders) and ('L' in cellBorders):
refs['myTableTopLeftCell'] = (cell.row, cell.col_idx)
nowInmyTable = True
if (nowInmyTable == True) and ('L' not in cellBorders):
if 'myTableBottomLeftCell' not in refs:
refs['myTableBottomLeftCell'] = (cell.row-1, cell.col_idx)
def getCellBorders(ws, cellRef):
tmp = ws[cellRef].border
brdrs = ''
if tmp.top.style is not None: brdrs += 'T'
if tmp.left.style is not None: brdrs += 'L'
if tmp.right.style is not None: brdrs += 'R'
if tmp.bottom.style is not None: brdrs += 'B'
return brdrs

To identify whether the "Q17" has a border:
from openpyxl.styles.borders import Border, Side
if sheet['Q17'].border.left.style == "thin":
print("Left side of Cell Q17 have left thin border")

I found a way around using border object by converting into JSON and getting the value of the border style
t = sheet.cell(1,1).border
f = json.dumps(t,default=lambda x: x.__dict__)
r = json.loads(f)
s = r['left']['style']
print(s) # which prints the value for left border style
if s == 'thin':
#do specific action

Variable changing gives me syntax error "Can't assign to operator"

I'm coding a text-based game in Python on a site called 'repl.it', and while trying to change a variable, I came upon this error:
Traceback (most recent call last):
File "python", line 526
SyntaxError: can't assign to operator
I'm a newbie at python, and I can't understand much, so I just want someone to fix the code and tell me how this works.
Here's the code:
#Adding hits or misses
if decision == "t3" : mothershiphit = mothershiphit + 1
elif decision == "t2" : jetdown = jetdown + 1
else: mothershipmiss = mothershipmiss + 1, mothershiplanded = mothershiplanded + 1
print " "
Link to the game: It's not finished yet, but I'll keep working on it.

Instead of using a comma (,), use a semicolon (;) to separate the statements in a single line.
if decision == "t3": mothershiphit = mothershiphit + 1
elif decision == "t2": jetdown = jetdown + 1
else: mothershipmiss = mothershipmiss + 1; mothershiplanded = mothershiplanded + 1
Explanation
With the comma, the interpreter will think that is doing the following:
mothershipmiss + 1, mothershiplanded = mothershiplanded + 1
As you can see, in the first line you're actually adding + 1 to the operator (what's on the left) and this is not valid.
With the semicolon, the statement would look like this instead:
mothershipmiss = mothershipmiss + 1
mothershiplanded = mothershiplanded + 1
Which is valid, as you're assigning 1 to the element on the right.

Why do identical strings return false with ==?

This is the simplified version of my problem.
QA = open('Qestions and answers.txt')
Q = []
A = []
for line in QA:
(first,second) = line.split(';')
Q.append(first)
A.append(second)
QA.close()
print(A[0], A[1])
print(A[0] == '1981')
print(A[1] == 'Feb')
print(str(A[0]) == '1981') # I even tried str
print(str(A[1]) == "Feb")
Output:
1981
Feb
False
False
False
False

You've got extra whitespace in there. My guess is this:
print(repr(A[0]))
Output:
'1981\n'
This is because when you read lines from a file, you will get the line breaks at the end of each line as well. If you don't want that, strip them out.
for line in QA:
line = line.rstrip('\n')
...

strip() works for this problem
print(A[0].strip() == '1981')
print(A[1].strip() == 'Feb')
True
True

(Python) List index out of range when trying to pull data out of a .CSV?

This program pulls data out of two .CSV files, which are linked here:
https://drive.google.com/folderview?id=0B1SjPejhqNU-bVkzYlVHM2oxdGs&usp=sharing
It's supposed to look for anything after a comma in each of the two files, but my range logic is somehow wrong. I'm running a traceback error to line 101:
"line 101, in calc_corr: sum_smokers_value = sum_smokers_value + float(s_percent_smokers_data[r][1])
IndexError: list index out of range"
I assume it would do the same for the other times [k][1] shows up.
many thanks in advance if there's a way to fix this.
the program so far is:
# this program opens two files containing data and runs a corralation calculation
import math
def main():
try:
print('does smoking directly lead to lung cancer?')
print('''let's find out, shall we?''''')
print('to do so, this program will find correlation between the instances of smokers, and the number of people with lung cancer.')
percent_smokers, percent_cancer = retrieve_csv()
s_percent_smokers_data, c_percent_cancer_data = read_csv(percent_smokers, percent_cancer)
correlation = calc_corr(s_percent_smokers_data, c_percent_cancer_data,)
print('r_value =', corretation)
except IOError as e:
print(str(e))
print('this program has been cancelled. run it again.')
def retrieve_csv():
num_times_failed = 0
percent_smokers_opened = False
percent_cancer_opened = False
while((not percent_smokers_opened) or (not percent_cancer_opened)) and (num_times_failed < 5):
try:
if not percent_smokers_opened:
percent_smokers_input = input('what is the name of the file containing the percentage of smokers per state?')
percent_smokers = open(percent_smokers_input, 'r')
percent_smokers_opened = True
if not percent_cancer_opened:
percent_cancer_input = input('what is the name of the file containing the number of cases of lung cancer contracted?')
percent_cancer = open(percent_cancer_input, 'r')
percent_cancer_opened = True
except IOError:
print('a file was not located. try again.')
num_times_failed = num_times_failed + 1
if not percent_smokers_opened or not percent_cancer_opened:
raise IOError('you have failed too many times.')
else:
return(percent_smokers, percent_cancer)
def read_csv(percent_smokers, percent_cancer):
s_percent_smokers_data = []
c_percent_cancer_data = []
empty_list = ''
percent_smokers.readline()
percent_cancer.readline()
eof = False
while not eof:
smoker_list = percent_smokers.readline()
cancer_list = percent_cancer.readline()
if smoker_list == empty_list and cancer_list == empty_list:
eof = True
elif smoker_list == empty_list:
raise IOError('smokers file error')
elif cancer_list == empty_list:
raise IOError('cancer file error')
else:
s_percent_smokers_data.append(smoker_list.strip().split(','))
c_percent_cancer_data.append(cancer_list.strip().split(','))
return (s_percent_smokers_data, c_percent_cancer_data)
def calc_corr(s_percent_smokers_data, c_percent_cancer_data):
sum_smokers_value = sum_cancer_cases_values = 0
sum_smokers_sq = sum_cancer_cases_sq = 0
sum_value_porducts = 0
numbers = len(s_percent_smokers_data)
for k in range(0, numbers):
sum_smokers_value = sum_smokers_value + float(s_percent_smokers_data[k][1])
sum_cancer_cases_values = sum_cancer_cases_values + float(c_percent_cancer_data[k][1])
sum_smokers_sq = sum_smokers_sq + float(s_percent_smokers_data[k][1]) ** 2
sum_cancer_cases_sq = sum_cancer_cases_sq + float(c_percent_cancer_data[k][1]) ** 2
sum_value_products = sum_value_products + float(percent_smokers[k][1]) ** float(percent_cancer[k][1])
numerator_value = (numbers * sum_value_products) - (sum_smokers_value * sum_cancer_cases_values)
denominator_value = math.sqrt(abs((numbers * sum_smokers_sq) - (sum_smokers_value ** 2)) * ((numbers * sum_cancer_cases_sq) - (sum_cancer_cases_values ** 2)))
return numerator_value / denominator_value
main()

The values in each row of your data files are not comma separated, but rather tab separated. You need to change the ',' delimiter character you're splitting on for '\t'. Or perhaps use the csv module and tell it that your delimiter is '\t'. You can read more about the csv module in the documentation.

Complex iteration over PyTables

If anyone is into handling Pytables maybe could give me a clue about this complex expression which is not working:
hdf5file = openFile("savedTable.h5", mode = 'r')
tab = hdf5file.getNode("/Data")
for i in xrange(1,10):
result = [result + 1 for x in tab.where("""(col1== 1) & (col2 == 1) & (col3== i) & ((col4 == 1) | (col5 == 1) | (col6 == 1) | (col7== 1))""")]
What Spyder is giving me is just this typical message "Invalid Syntax"
Special atention to the loop "for i in ...." and in the query "... & (col3==i)" I do not know if this part can be done like that.

You're right, you can't do this:
for i in xrange(1,10):
tab.where('col3 == i')
Instead, try:
for i in xrange(1,10):
cond = 'col3 == %d' % i
tab.where(cond)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error after elif statement is added to df.iterrows() loop - python

Not sure where the error is without the traceback message, but it should be: df.at[i,"BN"] = NPI2/((row["KBETR"]-row["P"]) where you have 2 parentheses in ((row["KBETR"] instead of 1(if you wish to have row["KBETR"]-row["P"] surrounded by parentheses).

Related

OpenPyXL - How to query cell borders?

Variable changing gives me syntax error "Can't assign to operator"

Why do identical strings return false with ==?

(Python) List index out of range when trying to pull data out of a .CSV?

Complex iteration over PyTables

Categories

Resources