Related
So I am currently preparing for a competition (Australian Informatics Olympiad) and in the training hub, there is a problem in AIO 2018 intermediate called Castle Cavalry. I finished it:
input = open("cavalryin.txt").read()
output = open("cavalryout.txt", "w")
squad = input.split()
total = squad[0]
squad.remove(squad[0])
squad_sizes = squad.copy()
squad_sizes = list(set(squad))
yn = []
for i in range(len(squad_sizes)):
n = squad.count(squad_sizes[i])
if int(squad_sizes[i]) == 1 and int(n) == int(total):
yn.append(1)
elif int(n) == int(squad_sizes[i]):
yn.append(1)
elif int(n) != int(squad_sizes[i]):
yn.append(2)
ynn = list(set(yn))
if len(ynn) == 1 and int(ynn[0]) == 1:
output.write("YES")
else:
output.write("NO")
output.close()
I submitted this code and I didn't pass because it was too slow, at 1.952secs. The time limit is 1.000 secs. I wasn't sure how I would shorten this, as to me it looks fine. PLEASE keep in mind I am still learning, and I am only an amateur. I started coding only this year, so if the answer is quite obvious, sorry for wasting your time 😅.
Thank you for helping me out!
One performance issue is calling int() over and over on the same entity, or on things that are already int:
if int(squad_sizes[i]) == 1 and int(n) == int(total):
elif int(n) == int(squad_sizes[i]):
elif int(n) != int(squad_sizes[i]):
if len(ynn) == 1 and int(ynn[0]) == 1:
But the real problem is your code doesn't work. And making it faster won't change that. Consider the input:
4
2
2
2
2
Your code will output "NO" (with missing newline) despite it being a valid configuration. This is due to your collapsing the squad sizes using set() early in your code. You've thrown away vital information and are only really testing a subset of the data. For comparison, here's my complete rewrite that I believe handles the input correctly:
with open("cavalryin.txt") as input_file:
string = input_file.read()
total, *squad_sizes = map(int, string.split())
success = True
while squad_sizes:
squad_size = squad_sizes.pop()
for _ in range(1, squad_size):
try:
squad_sizes.remove(squad_size) # eliminate n - 1 others like me
except ValueError:
success = False
break
else: # no break
continue
break
with open("cavalryout.txt", "w") as output_file:
print("YES" if success else "NO", file=output_file)
Note that I convert all the input to int early on so I don't have to consider that issue again. I don't know whether this will meet AIO's timing constraints.
I can see some things in there that might be inefficient, but the best way to optimize code is to profile it: run it with a profiler and sample data.
You can easily waste time trying to speed up parts that don't need it without having much effect. Read up on the cProfile module in the standard library to see how to do this and interpret the output. A profiling tutorial is probably too long to reproduce here.
My suggestions, without profiling,
squad.remove(squad[0])
Removing the start of a big list is slow, because the rest of the list has to be copied as it is shifted down. (Removing the end of the list is faster, because lists are typically backed by arrays that are overallocated (more slots than elements) anyway, to make .append()s fast, so it only has to decrease the length and can keep the same array.
It would be better to set this to a dummy value and remove it when you convert it to a set (sets are backed by hash tables, so removals are fast), e.g.
dummy = object()
squad[0] = dummy # len() didn't change. No shifting required.
...
squad_sizes = set(squad)
squad_sizes.remove(dummy) # Fast lookup by hash code.
Since we know these will all be strings, you can just use None instead of a dummy object, but the above technique works even when your list might contain Nones.
squad_sizes = squad.copy()
This line isn't required; it's just doing extra work. The set() already makes a shallow copy.
n = squad.count(squad_sizes[i])
This line might be the real bottleneck. It's effectively a loop inside a loop, so it basically has to scan the whole list for each outer loop. Consider using collections.Counter for this task instead. You generate the count table once outside the loop, and then just look up the numbers for each string.
You can also avoid generating the set altogether if you do this. Just use the Counter object's keys for your set.
Another point unrelated to performance. It's unpythonic to use indexes like [i] when you don't need them. A for loop can get elements from an iterable and assign them to variables in one step:
from collections import Counter
...
count_table = Counter(squad)
for squad_size, n in count_table.items():
...
You can collect all occurences of the preferred number for each knight in a dictionary.
Then test if the number of knights with a given preferred number is divisible by that number.
with open('cavalryin.txt', 'r') as f:
lines = f.readlines()
# convert to int
list_int = [int(a) for a in lines]
#initialise counting dictionary: key: preferred number, item: empty list to collect all knights with preferred number.
collect_dict = {a:[] for a in range(1,1+max(list_int[1:]))}
print(collect_dict)
# loop though list, ignoring first entry.
for a in list_int[1:]:
collect_dict[a].append(a)
# initialise output
out='YES'
for key, item in collect_dict.items():
# check number of items with preference for number is divisilbe
# by that number
if item: # if list has entries:
if (len(item) % key) > 0:
out='NO'
break
with open('cavalryout.txt', 'w') as f:
f.write(out)
Problem: Implement bubble sort in python. Don’t use Python’s built in sort or sorted. Assume your inputs will be sufficient for the memory you have.
Code :
def check_if_sorted(details):
"""compares two consecutive integers in a list and returns the list if sorted else get the list sorted by calling another function"""
i = 0
while i <= len(details) - 2:
if details[i] < details[i+1]:
status = True
i = i+1
print details
else:
bubble_sort(details)
if status :
return details
def bubble_sort(details):
"""compares two consecutive integers in a list and shifts the smaller one to left """
for i in range(len(details)-1):
if details[i] > details[i+1]:
temp = details[i]
details[i]= details[i+1]
details[i+1] = temp
return check_if_sorted(details)
sort_me = [11,127,56,2,1,5,7,9,11,65,12,24,76,87,123,65,8,32,86,123,67,1,67,92,72,39,49,12, 98,52,45,19,37,22,1,66,943,415,21,785,12,698,26,36,18,97,0,63,25,85,24,94,1501]
print sort_me
print bubble_sort(sort_me)
I have written the following code but it keeps on running even after sorting the list and then prints messsage "RuntimeError: maximum recursion depth exceeded while calling a Python object". How can I stop recursion after checking the list is sorted?
you don't need check_if_sorted(details) function. use try/except in your sort function to check IndexError and keep calling the bubble_sort() function.
def bubble_sort(details):
"""compares two consecutive integers in a list and shifts the smaller one to left """
for i in range(len(details)-1):
try:
if details[i] > details[i+1]:
temp = details[i]
details[i]= details[i+1]
details[i+1] = temp
bubble_sort(details)
except IndexError:
return
return details
sort_me = [11,127,56,2,1,5,7,9,11,65,12,24,76,87,123,65,8,32,86,123,67,1,67,92,72,39,49,12, 98,52,45,19,37,22,1,66,943,415,21,785,12,698,26,36,18,97,0,63,25,85,24,94,1501]
print(sort_me)
print(bubble_sort(sort_me))
In check_if_sorted you do not check for equal. This causes the duplicates in your list to trigger another (unneeded) calling to bubble_sort causing an infinite loop. To fix it change the comparison line in check_if_sorted to:
if details[i] <= details[i+1]:
Edit:
This solves the infinite loop, However your algorithm is very inefficient. To improve your algorithm I suggest a google search for 'bubble sort algorithm', and/or a discussion with your instructor
I'm creating objects derived from a rather large txt file. My code is working properly but takes a long time to run. This is because the elements I'm looking for in the first place are not ordered and not (necessarily) unique. For example I am looking for a digit-code that might be used twice in the file but could be in the first and the last row. My idea was to check how often a certain code is used...
counter=collections.Counter([l[3] for l in self.body])
...and then loop through the counter. Advance: if a code is only used once you don't have to iterate over the whole file. However You are stuck with a lot of iterations which makes the process really slow.
So my question really is: how can I improve my code? Another idea of course is to oder the data first. But that could take quite long as well.
The crucial part is this method:
def get_pc(self):
counter=collections.Counter([l[3] for l in self.body])
# This returns something like this {'187':'2', '199':'1',...}
pcode = []
#loop through entries of counter
for k,v in counter.iteritems():
i = 0
#find post code in body
for l in self.body:
if i == v:
break
# find fist appearence of key
if l[3] == k:
#first encounter...
if i == 0:
#...so create object
self.pc = CodeCana(k,l[2])
pcode.append(self.pc)
i += 1
# make attributes
self.pc.attr((l[0],l[1]),l[4])
if v <= 1:
break
return pcode
I hope the code explains the problem sufficiently. If not, let me know and I will expand the provided information.
You are looping over body way too many times. Collapse this into one loop, and track the CodeCana items in a dictionary instead:
def get_pc(self):
pcs = dict()
pcode = []
for l in self.body:
pc = pcs.get(l[3])
if pc is None:
pc = pcs[l[3]] = CodeCana(l[3], l[2])
pcode.append(pc)
pc.attr((l[0],l[1]),l[4])
return pcode
Counting all items first then trying to limit looping over body by that many times while still looping over all the different types of items defeats the purpose somewhat...
You may want to consider giving the various indices in l names. You can use tuple unpacking:
for foo, bar, baz, egg, ham in self.body:
pc = pcs.get(egg)
if pc is None:
pc = pcs[egg] = CodeCana(egg, baz)
pcode.append(pc)
pc.attr((foo, bar), ham)
but building body out of a namedtuple-based class would help in code documentation and debugging even more.
it an answer for Euler project #4.
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 99.
Find the largest palindrome made from the product of two 3-digit numbers.
Answer:
906609
code is:
from multiprocessing import Pool
from itertools import product
def sym(lst):
rst=[]
for x,y in lst:
tmp=x*y
if rec(tmp):
rst.append(tmp)
return rst
def rec(num):
num=str(num)
if num == "".join(reversed(num)): return True
else: return False
if __name__ == "__main__":
pool=Pool(processes=8)
lst=product(xrange(100,1000),repeat=2)
rst=pool.map(sym,lst)
#rst=sym(lst)
print max(rst)
when I run this:
# TypeError:'int' object is not iterable
but I can't understand it...isn't list iterable? or is there an error in my code?
Your problem is with the function sym.
sym is being passed the first element of your product iterable. (e.g. lst = (100,100)). When you get to the for loop, you're iterating over lst and then trying to unpack it into two numbers -- equivalent to:
for x,y in (100,100):
...
Which fails for obvious reasons.
I think you probably want to get rid of the for loop all-together -- which was probably an artifact of your serial version.
def sym(lst):
x,y=lst
tmp=x*y
if rec(tmp):
return tmp
else:
return None #max will ignore None values since None > x is always False.
The traceback was somewhat cryptic -- Apparently the traceback gets returned to the Pool which then gets re-raised ... But the way it is done makes it a little difficult to track.
Sometimes, when debugging these things it is helpful to replace Pool.map() with the regular version of map. Then, any exceptions which get raised are raised on your main "thread" and the tracebacks can be a little easier to follow.
I have a python function (call it myFunction) that gets as input a list of numbers, and, following a complex calculation, returns back the result of the calculation (which is a number).
The function looks like this:
def myFunction( listNumbers ):
# initialize the result of the calculation
calcResult = 0
# looping through all indices, from 0 to the last one
for i in xrange(0, len(listNumbers), 1):
# some complex calculation goes here, changing the value of 'calcResult'
# let us now return the result of the calculation
return calcResult
I tested the function, and it works as expected.
Normally, myFunction is provided a listNumbers argument that contains 5,000,000 elements in it. As you may expect, the calculation takes time. I need this function to run as fast as possible
Here comes the challenge: assume that the time now is 5am, and that listNumbers contains just 4,999,999 values in it. Meaning, its LAST VALUE is not yet available. This value will only be available at 6am.
Obviously, we can do the following (1st mode): wait until 6am. Then, append the last value into listNumbers, and then, run myFunction. This solution works, BUT it will take a while before myFunction returns our calculated result (as we need to process the entire list of numbers, from the first element on). Remember, our goal is to get the results as soon as possible past 6am.
I was thinking about a more efficient way to solve this (2nd mode): since (at 5am) we have listNumbers with 4,999,999 values in it, let us immediately start running myFunction. Let us process whatever we can (remember, we don't have the last piece of data yet), and then -- exactly at 6am -- 'plug in' the new data piece -- and generate the computed result. This should be significantly faster, as most of the processing will be done BEFORE 6am, hence, we will only have to deal with the new data -- which means the computed result should be available immediately after 6am.
Let's suppose that there's no way for us to inspect the code of myFunction or modify it. Is there ANY programming technique / design idea that will allow us to take myFunction AS IS, and do something with it (without changing its code) so that we can have it operate in the 2nd mode, rather than the 1st one?
Please do not suggest using c++ / numpy + cython / parallel computing etc to solve this problem. The goal here is to see if there's any programming technique or design pattern that can be easily used to solve such problems.
You could use a generator as an input. The generator will only return when there is data available to process.
Update: thanks for the brilliant comment, I wanted to remove this entry :)
class lazylist(object):
def __init__(self):
self.cnt = 0
self.length = 5000000
def __iter__(self):
return self
def __len__(self):
return self.length
def next(self):
if self.cnt < self.length:
self.cnt += 1
#return data here or wait for it
return self.cnt #just return a counter for this example
else:
raise StopIteration()
def __getitem__(self, i):
#again, block till you have data.
return i+1 #simple counter
myFunction(lazylist())
Update: As you can see from the comments and other solutions your loop construct and len call causes a lot of headaches, if you can eliminate it you can use a lot more elegant solution. for e in li or enumerate is the pythonic way to go.
By "list of numbers", do you mean an actual built-in list type?
If not, it's simple. Python uses duck-typing, so passing any sequence that supports iteration will do. Use the yield keyword to pass a generator.
def delayed_list():
for val in numpy_array[:4999999]:
yield val
wait_until_6am()
yield numpy_array[4999999]
and then,
myFunction(delayed_list())
If yes, then it's trickier :)
Also, check out PEP8 for recommended Python code style:
no spaces around brackets
my_function instead of myFunction
for i, val in enumerate(numbers): instead of for i in xrange(0, len(listNumbers), 1): etc.
subclass list so that when the function tries to read the last value it blocks until another thread provides the value.
import threading
import time
class lastblocks(list):
def __init__(self,*args,**kwargs):
list.__init__(self,*args,**kwargs)
self.e = threading.Event()
def __getitem__(self, index):
v1 = list.__getitem__(self,index)
if index == len(self)-1:
self.e.wait()
v2 = list.__getitem__(self,index)
return v2
else:
return v1
l = lastblocks(range(5000000-1)+[None])
def reader(l):
s = 0
for i in xrange(len(l)):
s += l[i]
print s
def writer(l):
time.sleep(10)
l[5000000-1]=5000000-1
l.e.set()
print "written"
reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()
prints:
written
12499997500000
for numpy:
import threading
import time
import numpy as np
class lastblocks(np.ndarray):
def __new__(cls, arry):
obj = np.asarray(arry).view(cls)
obj.e = threading.Event()
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.e = getattr(obj, 'e', None)
def __getitem__(self, index):
v1 = np.ndarray.__getitem__(self,index)
if index == len(self)-1:
self.e.wait()
v2 = np.ndarray.__getitem__(self,index)
return v2
else:
return v1
l = lastblocks(np.asarray(range(5000000-1)+[None]))
def reader(l):
s = 0
for i in xrange(len(l)):
s += l[i]
print s
def writer(l):
time.sleep(10)
l[5000000-1]=5000000-1
l.e.set()
print "written"
reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()
Memory protection barriers are a general way to solve this type of problem when the techniques suggested in the other answers (generators and mock objects) are unavailable.
A memory barrier is a hardware feature that causes an interrupt when a program tries to access a forbidden area of memory (usually controllable at the page level). The interrupt handler can then take appropriate action, for example suspending the program until the data is ready.
So in this case you'd set up a barrier on the last page of the list, and the interrupt handler would wait until 06:00 before allowing the program to continue.
You could just create your own iterator to iterate over the 5,000,000 elements. This would do whatever you need to do to wait around for the final element (can't be specific since the example in the question is rather abstract). I'm assuming you don't care about the code hanging until 6:00, or know how to do it in a background thread.
More information about writing your own iterator is at http://docs.python.org/library/stdtypes.html#iterator-types
There is a simpler generator solution:
def fnc(lst):
result = 0
index = 0
while index < len(lst):
while index < len(lst):
... do some manipulations here ...
index += 1
yield result
lst = [1, 2, 3]
gen = fnc(lst)
print gen.next()
lst.append(4)
print gen.next()
I'm a little bit confused about not being able to investigate myFunction. At least you have to know if your list is being iterated or accessed by index. Your example might suggest an index is used. If you want to take advantage of iterators/generators, you have to iterate. I know you said myFunction is unchangeable, but just want to point out, that most pythonic version would be:
def myFunction( listNumbers ):
calcResult = 0
# enumerate if you really need an index of element in array
for n,v in enumerate(listNumbers):
# some complex calculation goes here, changing the value of 'calcResult'
return calcResult
And now you can start introducing nice ideas. One is probably wrapping list with your own type and provide __iter__ method (as a generator); you could return value if accessible, wait for more data if you expect any or return after yielding last element.
If you have to access list by index, you can use __getitem__ as in Dan D's example. It'll have a limitation though, and you'll have to know the size of array in advance.
Couldn't you simply do something like this:
processedBefore6 = myFunction([1,2,3]) # the first 4,999,999 vals.
while lastVal.notavailable:
sleep(1)
processedAfter6 = myFunction([processedBefore6, lastVal])
If the effects are linear (step 1 -> step 2 -> step 3, etc) this should allow you to do as much work as possible up front, then catch the final value when it's available and finish up.