Perhaps I fundamentally misunderstand indentation in python? - Python - python

This code gives me an indentation error on checks. I get that this happens often, but the instance is in between two for loops that exist because I need to reference two different lists.
I do not even have the data set made yet, but it should report that the syntax is correct at least. The code is fairly simple. I want to automate package placement in a building and I want to do so by taking the biggest packages and putting them in place with the least amount of room where it would still fit.
All inputs that I used so far are dictionaries because I need to know which shelf I am referring too. I am this close to turning it to lists and being extremely strict about formatting.
inv = maxkey["Inventory"]
is the line where the mistake happens. I do not know how to fix it. Should I use lists for this project instead? Is there a flaw in the logic? Is there a parentheses I forgot? Please let me know if this is just an oversight on my part. Please contact me for further details.
def loadOrder(inProd, units, loc, pref, shelves):
items = len(inProd)
while items > 0
# What is the biggest package in the list?
mxw = 0 # Frontal area trackers
BoxId = {} # Identifies what is being selected
for p in inProd:
if p["Height"]*p["Width"] > mxw:
mxw = p["Width"]*p["Height"]
BoxId = p
else:
pass
# What is the location with the least amount of space?
maxi = 0.001
maxkey = {}
for key in loc:
if key["Volume Efficiency"] > maxi and key["Width"] > mxw/BoxId["Height"]:
maxi = key["Volume Efficiency"]
maxkey = key
else:
pass
maxkey["Inventory"].append(BoxId)
weight = 0
volTot = 0
usedL = 0
inv = maxkey["Inventory"]
for k in inv:
weight = k['Weight']+weight
vol = k['Height']*k['Width']*k['Depth']+volTot
usedL = k['Width']+usedL
maxkey["Volume Efficiency"] = volTot/(maxkey['Height']*maxkey['Weight']*maxkey['Depth'])
maxkey['Width Remaining'] = usedL
maxkey['Capacity Remaining'] = weight
del inProd[BoxId]
items = len(inProd)
return [inProd, units, loc, pref, shelves]

Indentation in a function definition should be like:
def function-name():
<some code>
<return something>
Also, you have missed : after while loop condition.
It shoulde be while items > 0:

And you should not mixing the use of tabs and spaces for indentation.
The standard way for indentation is 4 spaces.
you can see more in PEP 8.

Related

AIO Castle Cavalry - My code is too slow, is there a way I can shorten this?

So I am currently preparing for a competition (Australian Informatics Olympiad) and in the training hub, there is a problem in AIO 2018 intermediate called Castle Cavalry. I finished it:
input = open("cavalryin.txt").read()
output = open("cavalryout.txt", "w")
squad = input.split()
total = squad[0]
squad.remove(squad[0])
squad_sizes = squad.copy()
squad_sizes = list(set(squad))
yn = []
for i in range(len(squad_sizes)):
n = squad.count(squad_sizes[i])
if int(squad_sizes[i]) == 1 and int(n) == int(total):
yn.append(1)
elif int(n) == int(squad_sizes[i]):
yn.append(1)
elif int(n) != int(squad_sizes[i]):
yn.append(2)
ynn = list(set(yn))
if len(ynn) == 1 and int(ynn[0]) == 1:
output.write("YES")
else:
output.write("NO")
output.close()
I submitted this code and I didn't pass because it was too slow, at 1.952secs. The time limit is 1.000 secs. I wasn't sure how I would shorten this, as to me it looks fine. PLEASE keep in mind I am still learning, and I am only an amateur. I started coding only this year, so if the answer is quite obvious, sorry for wasting your time 😅.
Thank you for helping me out!
One performance issue is calling int() over and over on the same entity, or on things that are already int:
if int(squad_sizes[i]) == 1 and int(n) == int(total):
elif int(n) == int(squad_sizes[i]):
elif int(n) != int(squad_sizes[i]):
if len(ynn) == 1 and int(ynn[0]) == 1:
But the real problem is your code doesn't work. And making it faster won't change that. Consider the input:
4
2
2
2
2
Your code will output "NO" (with missing newline) despite it being a valid configuration. This is due to your collapsing the squad sizes using set() early in your code. You've thrown away vital information and are only really testing a subset of the data. For comparison, here's my complete rewrite that I believe handles the input correctly:
with open("cavalryin.txt") as input_file:
string = input_file.read()
total, *squad_sizes = map(int, string.split())
success = True
while squad_sizes:
squad_size = squad_sizes.pop()
for _ in range(1, squad_size):
try:
squad_sizes.remove(squad_size) # eliminate n - 1 others like me
except ValueError:
success = False
break
else: # no break
continue
break
with open("cavalryout.txt", "w") as output_file:
print("YES" if success else "NO", file=output_file)
Note that I convert all the input to int early on so I don't have to consider that issue again. I don't know whether this will meet AIO's timing constraints.
I can see some things in there that might be inefficient, but the best way to optimize code is to profile it: run it with a profiler and sample data.
You can easily waste time trying to speed up parts that don't need it without having much effect. Read up on the cProfile module in the standard library to see how to do this and interpret the output. A profiling tutorial is probably too long to reproduce here.
My suggestions, without profiling,
squad.remove(squad[0])
Removing the start of a big list is slow, because the rest of the list has to be copied as it is shifted down. (Removing the end of the list is faster, because lists are typically backed by arrays that are overallocated (more slots than elements) anyway, to make .append()s fast, so it only has to decrease the length and can keep the same array.
It would be better to set this to a dummy value and remove it when you convert it to a set (sets are backed by hash tables, so removals are fast), e.g.
dummy = object()
squad[0] = dummy # len() didn't change. No shifting required.
...
squad_sizes = set(squad)
squad_sizes.remove(dummy) # Fast lookup by hash code.
Since we know these will all be strings, you can just use None instead of a dummy object, but the above technique works even when your list might contain Nones.
squad_sizes = squad.copy()
This line isn't required; it's just doing extra work. The set() already makes a shallow copy.
n = squad.count(squad_sizes[i])
This line might be the real bottleneck. It's effectively a loop inside a loop, so it basically has to scan the whole list for each outer loop. Consider using collections.Counter for this task instead. You generate the count table once outside the loop, and then just look up the numbers for each string.
You can also avoid generating the set altogether if you do this. Just use the Counter object's keys for your set.
Another point unrelated to performance. It's unpythonic to use indexes like [i] when you don't need them. A for loop can get elements from an iterable and assign them to variables in one step:
from collections import Counter
...
count_table = Counter(squad)
for squad_size, n in count_table.items():
...
You can collect all occurences of the preferred number for each knight in a dictionary.
Then test if the number of knights with a given preferred number is divisible by that number.
with open('cavalryin.txt', 'r') as f:
lines = f.readlines()
# convert to int
list_int = [int(a) for a in lines]
#initialise counting dictionary: key: preferred number, item: empty list to collect all knights with preferred number.
collect_dict = {a:[] for a in range(1,1+max(list_int[1:]))}
print(collect_dict)
# loop though list, ignoring first entry.
for a in list_int[1:]:
collect_dict[a].append(a)
# initialise output
out='YES'
for key, item in collect_dict.items():
# check number of items with preference for number is divisilbe
# by that number
if item: # if list has entries:
if (len(item) % key) > 0:
out='NO'
break
with open('cavalryout.txt', 'w') as f:
f.write(out)

Self Limiting Repition Function

I'm writing a program that is basically a study guide/ practice test for the current section of my A&P class (it keeps me more engaged than just rereading notes over and over). The test works without any problems, but I have an issue where some of my questions use an "enterbox" input, I can have the question loop if the answer is incorrect, but I can't get it to break without a correct answer.
I figured out a way to make it work by putting the entire function back into the initial "else" tree, so that right or wrong you advance to the next question but it looks incredibly ugly and I can't believe there isn't a better way.
So my "solution" looks like such:
def question82():
x = "which type of metabolism provides the maximum amount of ATP needed for contraction?"
ques82 = enterbox(msg = x, title = version)
#version is a variable defined earlier
if ques82.lower() in ["aerobic"]:
add() #a function that is explained in the example further below
question83()
else:
loss() #again its a housecleaning function shown below
ques82b = enterbox(msg = x, title = version)
if ques82b.lower() in ["aerobic"]:
add()
question83()
else:
loss()
question83()
Okay so it worked, but using a nested if tree for each "enterbox" question looks kinda sloppy. I'm self taught so it may be the only solution but if there is something better I would love to learn about it.
So here is a complete section from my program:
from easygui import *
import sys
version = 'A&P EXAM 3 REVIEW'
points = 0
def add():
global points
msgbox("Correct", title = version)
points = points + 1
def loss():
global points
msgbox("Try Again", title = version)
points = points - 1
def question81():
x = "What chemical is stored by muscle as a source of readily available energy for muscle contractions"
ques81 = enterbox(msg = x, title = version)
if ques81.lower() in ["creatine"]:
add()
question82()
else:
loss()
question81()
It works as is so any errors from what's provided are probably my fault from copy and pasting.
Also I'm running it in python 2.7rc1 if that helps.
Thanks for any help in advance.
I don't know if there is a way to combine "enterbox" that has a button for "skip" as that would also be a solution.
Consider the following approach:
We define a list of question and answer pairs. We do this in one place so it's easy to maintain and we don't have to search all over the file to make changes or re-use this code for a different questionset.
We create an ask_question function that we can call for all of our questions. This way, if we want to make a change about how we implement our question logic, we only have to make it in one spot (and not in each of the questionXX functions).
We compare user input to our answer using == and not in (in will do something else, not what you expect).
We create an object to keep track of our answer results. Here, it's an instance of ResultsStore, but it can be anything really, let's just try to get away from global variables.
Use a loop when prompting for answers. The loop will repeat if the answer given was incorrect (and if retry_on_fail is False).
Allow for the user to enter some "skip" keyword to skip the question.
Display the results once the "test" is complete. Here, we do that by defining and calling the store.display_results() method.
So, what about:
from easygui import enterbox
question_answer_pairs = [
("1 + 1 = ?", "2"),
("2 * 3 = ?", "6"),
("which type of metabolism provides the maximum amount of ATP needed for contraction?", "aerobic")
]
VERSION = 'A&P EXAM 3 REVIEW'
class ResultStore:
def __init__(self):
self.num_correct = 0
self.num_skipped = 0
self.num_wrong = 0
def show_results(self):
print("Results:")
print(" Correct:", self.num_correct)
print(" Skipped:", self.num_skipped)
print(" Wrong: ", self.num_wrong)
def ask_question(q, a, rs, retry_on_fail=True):
while True:
resp = enterbox(msg=q, title=VERSION)
# Force resp to be a string if nothing is entered (so .lower() doesn't throw)
if resp is None: resp = ''
if resp.lower() == a.lower():
rs.num_correct += 1
return True
if resp.lower() == "skip":
rs.num_skipped += 1
return None
# If we get here, we haven't returned (so the answer was neither correct nor
# "skip"). Increment num_wrong and check whether we should repeat.
rs.num_wrong += 1
if retry_on_fail is False:
return False
# Create a ResultsStore object to keep track of how we did
store = ResultStore()
# Ask questions
for (q,a) in question_answer_pairs:
ask_question(q, a, store)
# Display results (calling the .show_results() method on the ResultsStore object)
store.show_results()
Now, the return value currently doesn't do anything, but it could!
RES_MAP = {
True: "Correct!",
None: "(skipped)",
False: "Incorrect" # Will only be shown if retry_on_fail is False
}
for (q,a) in question_answer_pairs:
res = ask_question(q, a, store)
print(RES_MAP[res])
Quick and dirty solution could be using the default value "skip" for the answer:
def question81():
x = "What chemical is stored by muscle as a source of readily available energy for muscle contractions"
ques81 = enterbox(msg = x, title = version, default = "skip")
if ques81.lower() == 'creatine':
add()
question82()
elif ques81 == 'skip':
# Do something
else:
loss()
question81()
But you should really study the answer given by jedwards. There's a lot to learn about
program design. He's not giving you the fish, he's teaching you to fish.

Converting Fortran 77 code to Python

I have looked at this Q/A Intent of this Fotran77 code and I have almost converted the below Fortran77 style code into Python 3.x except I had a doubt where the i = i + 1 should be placed in the Python version. As mentioned in the comments of the linked question I have done the conformance tests and the results are off by a margin of 2. Hence the question.
i = 0
500 continue
i = i +1
if (i .le. ni) then
if (u(i,j-1) .gt. -9999.) then
r(1,j) = u(i,j-1)
go to 600
else
missing = i
go to 500
end if
end if
600 continue
Here is my Python version
i = 0
while (i <= ni):
i = i+1
if (u[i,j-1] > -9999.0):
r[0,j] = u[i,j-1]
break
else:
missing = i
Did I place the increment counter at the right location ?
Directly translating is not advised because you loose a number of nice efficient coding features of python.
To do this properly in python you should 1) recognize the 0- index convention of python, and 2 ) recognize that that fortran is column major and python is row major so you should reverse the index ordering for all multi-dimensional arrays.
If you do that the loop can be written:
try:
r[j,0]=[val for val in u[j] if val > -9999 ][0]
missing=False
except:
missing=True
I'm assuming we don't actually need the numeric value of missing.
If you need it you will have something like this:
try:
missing,r[j,0]=[(index,val) for (index,val) in enumerate(u[j]) if val > -9999 ][0]
except:
missing=-1
You could also use next which would be faster, but it gets a little trickier handling the missing condition.

How to match fields from two lists and further filter based upon the values in subsequent fields?

EDIT: My question was answered on reddit. Here is the link if anyone is interested in the answer to this problem https://www.reddit.com/r/learnpython/comments/42ibhg/how_to_match_fields_from_two_lists_and_further/
I am attempting to get the pos and alt strings from file1 to match up with what is in
file2, fairly simple. However, file2 has values in the 17th split element/column to the
last element/column (340th) which contains string such as 1/1:1.2.2:51:12 which
I also want to filter for.
I want to extract the rows from file2 that contain/match the pos and alt from file1.
Thereafter, I want to further filter the matched results that only contain certain
values in the 17th split element/column onwards. But to do so the values would have to
be split by ":" so I can filter for split[0] = "1/1" and split[2] > 50. The problem is
I have no idea how to do this.
I imagine I will have to iterate over these and split but I am not sure how to do this
as the code is presently in a loop and the values I want to filter are in columns not rows.
Any advice would be greatly appreciated, I have sat with this problem since Friday and
have yet to find a solution.
import os,itertools,re
file1 = open("file1.txt","r")
file2 = open("file2.txt","r")
matched = []
for (x),(y) in itertools.product(file2,file1):
if not x.startswith("#"):
cells_y = y.split("\t")
pos_y = cells[0]
alt_y = cells[3]
cells_x = x.split("\t")
pos_x = cells_x[0]+":"+cells_x[1]
alt_x = cells_x[4]
if pos_y in pos_x and alt_y in alt_x:
matched.append(x)
for z in matched:
cells_z = z.split("\t")
if cells_z[16:len(cells_z)]:
Your requirement is not clear, but you might mean this:
for (x),(y) in itertools.product(file2,file1):
if x.startswith("#"):
continue
cells_y = y.split("\t")
pos_y = cells[0]
alt_y = cells[3]
cells_x = x.split("\t")
pos_x = cells_x[0]+":"+cells_x[1]
alt_x = cells_x[4]
if pos_y != pos_x: continue
if alt_y != alt_x: continue
extra_match = False
for f in range(17, 341):
y_extra = y[f].split(':')
if y_extra[0] != '1/1': continue
if y_extra[2] <= 50: continue
extra_match = True
break
if not extra_match: continue
xy = x + y
matched.append(xy)
I chose to concatenate x and y into the matched array, since I wasn't sure whether or not you would want all the data. If not, feel free to go back to just appending x or y.
You may want to look into the csv library, which can use tab as a delimiter. You can also use a generator and/or guards to make the code a bit more pythonic and efficient. I think your approach with indexes works pretty well, but it would be easy to break when trying to modify down the road, or to update if your file lines change shape. You may wish to create objects (I use NamedTuples in the last part) to represent your lines and make it much easier to read/refine down the road.
Lastly, remember that Python has a shortcut feature with the comparative 'if'
for example:
if x_evaluation and y_evaluation:
do some stuff
when x_evaluation returns False, Python will skip y_evaluation entirely. In your code, cells_x[0]+":"+cells_x[1] is evaluated every single time you iterate the loop. Instead of storing this value, I wait until the easier alt comparison evaluates to True before doing this (comparatively) heavier/uglier check.
import csv
def filter_matching_alt_and_pos(first_file, second_file):
for x in csv.reader(open(first_file, 'rb'), delimiter='\t'):
for y in csv.reader(open(second_file, 'rb'), delimiter='\t'):
# continue will skip the rest of this loop and go to the next value for y
# this way, we can abort as soon as one value isn't what we want
# .. todo:: we could make a filter function and even use the filter() built-in depending on needs!
if x[3] == y[4] and x[0] == ":".join(y[:1]):
yield x
def match_datestamp_and_alt_and_pos(first_file, second_file):
for z in filter_matching_alt_and_pos(first_file, second_file):
for element in z[16:]:
# I am not sure I fully understood your filter needs for the 2nd half. Here, I split all elements from the 17th onward and look for the two cases you mentioned. This seems like it might be very heavy, but at least we're using generators!
# same idea as before, we abort as early as possible to avoid needless indexing and checks
for chunk in element.split(":"):
# WARNING: if you aren't 100% sure the 2nd element is an int, this is very dangerous
# here, I use the continue keyword and the negative-check to help eliminate excess overhead. The execution is very similar as above, but might be easier to read/understand and can help speed things along in some cases
# once again, I do the lighter check before the heavier one
if not int(chunk[2])> 50:
# continue automatically skips to the next iteration on element
continue
if not chunk[:1] == "1/1":
continue
yield z
if __name__ == '__main__':
first_file = "first.txt"
second_file = "second.txt"
# match_datestamp_and_alt_and_pos returns a generator; for loop through it for the lines which matched all 4 cases
match_datestamp_and_alt_and_pos(first_file=first_file, second_file=second_file)
namedtuples for the first part
from collections import namedtuple
FirstFileElement = namedtuple("FirstFrameElement", "pos unused1 unused2 alt")
SecondFileElement = namedtuple("SecondFrameElement", "pos1 pos2 unused2 unused3 alt")
def filter_matching_alt_and_pos(first_file, second_file):
for x in csv.reader(open(first_file, 'rb'), delimiter='\t'):
for y in csv.reader(open(second_file, 'rb'), delimiter='\t'):
# continue will skip the rest of this loop and go to the next value for y
# this way, we can abort as soon as one value isn't what we want
# .. todo:: we could make a filter function and even use the filter() built-in depending on needs!
x_element = FirstFileElement(*x)
y_element = SecondFileElement(*y)
if x.alt == y.alt and x.pos == ":".join([y.pos1, y.pos2]):
yield x

Python: How to speed up creating of objects?

I'm creating objects derived from a rather large txt file. My code is working properly but takes a long time to run. This is because the elements I'm looking for in the first place are not ordered and not (necessarily) unique. For example I am looking for a digit-code that might be used twice in the file but could be in the first and the last row. My idea was to check how often a certain code is used...
counter=collections.Counter([l[3] for l in self.body])
...and then loop through the counter. Advance: if a code is only used once you don't have to iterate over the whole file. However You are stuck with a lot of iterations which makes the process really slow.
So my question really is: how can I improve my code? Another idea of course is to oder the data first. But that could take quite long as well.
The crucial part is this method:
def get_pc(self):
counter=collections.Counter([l[3] for l in self.body])
# This returns something like this {'187':'2', '199':'1',...}
pcode = []
#loop through entries of counter
for k,v in counter.iteritems():
i = 0
#find post code in body
for l in self.body:
if i == v:
break
# find fist appearence of key
if l[3] == k:
#first encounter...
if i == 0:
#...so create object
self.pc = CodeCana(k,l[2])
pcode.append(self.pc)
i += 1
# make attributes
self.pc.attr((l[0],l[1]),l[4])
if v <= 1:
break
return pcode
I hope the code explains the problem sufficiently. If not, let me know and I will expand the provided information.
You are looping over body way too many times. Collapse this into one loop, and track the CodeCana items in a dictionary instead:
def get_pc(self):
pcs = dict()
pcode = []
for l in self.body:
pc = pcs.get(l[3])
if pc is None:
pc = pcs[l[3]] = CodeCana(l[3], l[2])
pcode.append(pc)
pc.attr((l[0],l[1]),l[4])
return pcode
Counting all items first then trying to limit looping over body by that many times while still looping over all the different types of items defeats the purpose somewhat...
You may want to consider giving the various indices in l names. You can use tuple unpacking:
for foo, bar, baz, egg, ham in self.body:
pc = pcs.get(egg)
if pc is None:
pc = pcs[egg] = CodeCana(egg, baz)
pcode.append(pc)
pc.attr((foo, bar), ham)
but building body out of a namedtuple-based class would help in code documentation and debugging even more.

Categories

Resources