Python: How to speed up creating of objects? - python

I'm creating objects derived from a rather large txt file. My code is working properly but takes a long time to run. This is because the elements I'm looking for in the first place are not ordered and not (necessarily) unique. For example I am looking for a digit-code that might be used twice in the file but could be in the first and the last row. My idea was to check how often a certain code is used...
counter=collections.Counter([l[3] for l in self.body])
...and then loop through the counter. Advance: if a code is only used once you don't have to iterate over the whole file. However You are stuck with a lot of iterations which makes the process really slow.
So my question really is: how can I improve my code? Another idea of course is to oder the data first. But that could take quite long as well.
The crucial part is this method:
def get_pc(self):
counter=collections.Counter([l[3] for l in self.body])
# This returns something like this {'187':'2', '199':'1',...}
pcode = []
#loop through entries of counter
for k,v in counter.iteritems():
i = 0
#find post code in body
for l in self.body:
if i == v:
break
# find fist appearence of key
if l[3] == k:
#first encounter...
if i == 0:
#...so create object
self.pc = CodeCana(k,l[2])
pcode.append(self.pc)
i += 1
# make attributes
self.pc.attr((l[0],l[1]),l[4])
if v <= 1:
break
return pcode
I hope the code explains the problem sufficiently. If not, let me know and I will expand the provided information.

You are looping over body way too many times. Collapse this into one loop, and track the CodeCana items in a dictionary instead:
def get_pc(self):
pcs = dict()
pcode = []
for l in self.body:
pc = pcs.get(l[3])
if pc is None:
pc = pcs[l[3]] = CodeCana(l[3], l[2])
pcode.append(pc)
pc.attr((l[0],l[1]),l[4])
return pcode
Counting all items first then trying to limit looping over body by that many times while still looping over all the different types of items defeats the purpose somewhat...
You may want to consider giving the various indices in l names. You can use tuple unpacking:
for foo, bar, baz, egg, ham in self.body:
pc = pcs.get(egg)
if pc is None:
pc = pcs[egg] = CodeCana(egg, baz)
pcode.append(pc)
pc.attr((foo, bar), ham)
but building body out of a namedtuple-based class would help in code documentation and debugging even more.

Related

Runtime Error (Python3) when you manipulate lists with very long strings

I wrote a Python3 code to manipulate lists of strings but the code gives Runtime Error for long strings. Here is my code for the problem:
string = "BANANA"
slist= list (string)
mark = list(range(len(slist)))
vowel_substrings = list()
consonants_substrings = list()
#print(mark)
for i in range(len(slist)):
if slist[i]=='A' or slist[i]=='E' or slist[i]=='I' or slist[i]=='O' or mark[i]=='U':
mark[i] = 1
else:
mark[i] = 0
#print(mark)
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings.append(string[j:l+1])
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings.append(string[j:l+1])
#print(consonants_substrings)
unique_consonants = list(set(consonants_substrings))
unique_vowels = list(set(vowel_substrings))
##add two lists
all_substrings = consonants_substrings+(vowel_substrings)
#print(all_substrings)
##Find points earned by vowel guy and consonant guy
vowel_guy_score = 0
consonant_guy_score = 0
for strng in unique_vowels:
vowel_guy_score += vowel_substrings.count(strng)
for strng in unique_consonants:
consonant_guy_score += consonants_substrings.count(strng)
#print(vowel_guy_score) #Kevin
#print(consonant_guy_score) #Stuart
if vowel_guy_score > consonant_guy_score:
print("Kevin ",vowel_guy_score)
elif vowel_guy_score < consonant_guy_score:
print("Stuart ",consonant_guy_score)
else:
print("Draw")
gives the right answer. But if you have a long string, shown below, it fails.
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I think initialization or memory allocation might be a problem but I don't know how to allocate memory before even knowing how much memory the code will need. Thank you in advance for any help you can provide.
In the middle there, you generate a data structure of size O(nĀ³): for each starting position Ɨ each ending position Ɨ length of the substring. That's probably where your memory problems appear (you haven't posted a traceback).
One possible optimisation would be, instead of having a list of substrings and then generating the set, use instead a Counter class. That would let you know how many times each substring appears without storing all the copies:
vowel_substrings = collections.Counter()
consonant_substrings = collections.Counter()
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings[string[j:l+1]] += 1
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings[string[j:l+1]] += 1
Even better would be to calculate the scores as you go along, without storing any of the substrings. If I'm reading the code correctly, the substrings aren't actually used for anything ā€” each letter is effectively scored based on its distance from the end of the string, and the scores are added up. This can be calculated in a single pass through the string, without making any additional copies or keeping track of anything other than the cumulative scores and the length of the string.

Perhaps I fundamentally misunderstand indentation in python? - Python

This code gives me an indentation error on checks. I get that this happens often, but the instance is in between two for loops that exist because I need to reference two different lists.
I do not even have the data set made yet, but it should report that the syntax is correct at least. The code is fairly simple. I want to automate package placement in a building and I want to do so by taking the biggest packages and putting them in place with the least amount of room where it would still fit.
All inputs that I used so far are dictionaries because I need to know which shelf I am referring too. I am this close to turning it to lists and being extremely strict about formatting.
inv = maxkey["Inventory"]
is the line where the mistake happens. I do not know how to fix it. Should I use lists for this project instead? Is there a flaw in the logic? Is there a parentheses I forgot? Please let me know if this is just an oversight on my part. Please contact me for further details.
def loadOrder(inProd, units, loc, pref, shelves):
items = len(inProd)
while items > 0
# What is the biggest package in the list?
mxw = 0 # Frontal area trackers
BoxId = {} # Identifies what is being selected
for p in inProd:
if p["Height"]*p["Width"] > mxw:
mxw = p["Width"]*p["Height"]
BoxId = p
else:
pass
# What is the location with the least amount of space?
maxi = 0.001
maxkey = {}
for key in loc:
if key["Volume Efficiency"] > maxi and key["Width"] > mxw/BoxId["Height"]:
maxi = key["Volume Efficiency"]
maxkey = key
else:
pass
maxkey["Inventory"].append(BoxId)
weight = 0
volTot = 0
usedL = 0
inv = maxkey["Inventory"]
for k in inv:
weight = k['Weight']+weight
vol = k['Height']*k['Width']*k['Depth']+volTot
usedL = k['Width']+usedL
maxkey["Volume Efficiency"] = volTot/(maxkey['Height']*maxkey['Weight']*maxkey['Depth'])
maxkey['Width Remaining'] = usedL
maxkey['Capacity Remaining'] = weight
del inProd[BoxId]
items = len(inProd)
return [inProd, units, loc, pref, shelves]
Indentation in a function definition should be like:
def function-name():
<some code>
<return something>
Also, you have missed : after while loop condition.
It shoulde be while items > 0:
And you should not mixing the use of tabs and spaces for indentation.
The standard way for indentation is 4 spaces.
you can see more in PEP 8.

How can I keep track of what combinations have been tried in a brute force approach?

I'm using Python 3 to create a brute-force Vigenere decipher-er. Vigenere codes are basically adding strings of letters together.
The way I want my code to work is the user puts in however any keys they want (this bit's done), the letters are turned into their numbers (also done) then it adds every pair of keys together (working on this, also what I need help with) and prints out the two keys and what they added to.
To do this, I need to be able to keep track of which pairs of keys have been added together. How can I do this?
BTW, my current code is this. I'm doing this both fro the decoding and the programming practice, so I really just want the way to keep track of added key pairs, not the whole program.
#defines start variables
import math
alph = "abcdefghijklmnopqrstuvwxyz"
keyqty = int(input("how many keys?"))
listofkeys = []
listofindex = []
timer = 0
#gets keys
while True:
if timer >= keyqty:
break
else:
pass
listofkeys.append(input("key: ").lower())
timer += 1
tempkey = ""
#blank before key
for item in listofkeys:
listofindex.append("")
for letter in item:
listofindex.append(alph.find(letter)
timer = 0
newkey = False
key1index = []
key2index = []
endex = []
printletter = ""
doneadds = []
Obviously, it still needs some other work, but some help would be appreciated.
You can either use a set for fast lookup (amortized constant time).
tried = set()
for ...
if word not in tried:
try()
tried.add(word)
or use itertools.product() to generate your trials without the need of keeping track of the already tried ones.
for password in itertools.product(alph, repeat=keyqty):
try(password)

AIO Castle Cavalry - My code is too slow, is there a way I can shorten this?

So I am currently preparing for a competition (Australian Informatics Olympiad) and in the training hub, there is a problem in AIO 2018 intermediate called Castle Cavalry. I finished it:
input = open("cavalryin.txt").read()
output = open("cavalryout.txt", "w")
squad = input.split()
total = squad[0]
squad.remove(squad[0])
squad_sizes = squad.copy()
squad_sizes = list(set(squad))
yn = []
for i in range(len(squad_sizes)):
n = squad.count(squad_sizes[i])
if int(squad_sizes[i]) == 1 and int(n) == int(total):
yn.append(1)
elif int(n) == int(squad_sizes[i]):
yn.append(1)
elif int(n) != int(squad_sizes[i]):
yn.append(2)
ynn = list(set(yn))
if len(ynn) == 1 and int(ynn[0]) == 1:
output.write("YES")
else:
output.write("NO")
output.close()
I submitted this code and I didn't pass because it was too slow, at 1.952secs. The time limit is 1.000 secs. I wasn't sure how I would shorten this, as to me it looks fine. PLEASE keep in mind I am still learning, and I am only an amateur. I started coding only this year, so if the answer is quite obvious, sorry for wasting your time šŸ˜….
Thank you for helping me out!
One performance issue is calling int() over and over on the same entity, or on things that are already int:
if int(squad_sizes[i]) == 1 and int(n) == int(total):
elif int(n) == int(squad_sizes[i]):
elif int(n) != int(squad_sizes[i]):
if len(ynn) == 1 and int(ynn[0]) == 1:
But the real problem is your code doesn't work. And making it faster won't change that. Consider the input:
4
2
2
2
2
Your code will output "NO" (with missing newline) despite it being a valid configuration. This is due to your collapsing the squad sizes using set() early in your code. You've thrown away vital information and are only really testing a subset of the data. For comparison, here's my complete rewrite that I believe handles the input correctly:
with open("cavalryin.txt") as input_file:
string = input_file.read()
total, *squad_sizes = map(int, string.split())
success = True
while squad_sizes:
squad_size = squad_sizes.pop()
for _ in range(1, squad_size):
try:
squad_sizes.remove(squad_size) # eliminate n - 1 others like me
except ValueError:
success = False
break
else: # no break
continue
break
with open("cavalryout.txt", "w") as output_file:
print("YES" if success else "NO", file=output_file)
Note that I convert all the input to int early on so I don't have to consider that issue again. I don't know whether this will meet AIO's timing constraints.
I can see some things in there that might be inefficient, but the best way to optimize code is to profile it: run it with a profiler and sample data.
You can easily waste time trying to speed up parts that don't need it without having much effect. Read up on the cProfile module in the standard library to see how to do this and interpret the output. A profiling tutorial is probably too long to reproduce here.
My suggestions, without profiling,
squad.remove(squad[0])
Removing the start of a big list is slow, because the rest of the list has to be copied as it is shifted down. (Removing the end of the list is faster, because lists are typically backed by arrays that are overallocated (more slots than elements) anyway, to make .append()s fast, so it only has to decrease the length and can keep the same array.
It would be better to set this to a dummy value and remove it when you convert it to a set (sets are backed by hash tables, so removals are fast), e.g.
dummy = object()
squad[0] = dummy # len() didn't change. No shifting required.
...
squad_sizes = set(squad)
squad_sizes.remove(dummy) # Fast lookup by hash code.
Since we know these will all be strings, you can just use None instead of a dummy object, but the above technique works even when your list might contain Nones.
squad_sizes = squad.copy()
This line isn't required; it's just doing extra work. The set() already makes a shallow copy.
n = squad.count(squad_sizes[i])
This line might be the real bottleneck. It's effectively a loop inside a loop, so it basically has to scan the whole list for each outer loop. Consider using collections.Counter for this task instead. You generate the count table once outside the loop, and then just look up the numbers for each string.
You can also avoid generating the set altogether if you do this. Just use the Counter object's keys for your set.
Another point unrelated to performance. It's unpythonic to use indexes like [i] when you don't need them. A for loop can get elements from an iterable and assign them to variables in one step:
from collections import Counter
...
count_table = Counter(squad)
for squad_size, n in count_table.items():
...
You can collect all occurences of the preferred number for each knight in a dictionary.
Then test if the number of knights with a given preferred number is divisible by that number.
with open('cavalryin.txt', 'r') as f:
lines = f.readlines()
# convert to int
list_int = [int(a) for a in lines]
#initialise counting dictionary: key: preferred number, item: empty list to collect all knights with preferred number.
collect_dict = {a:[] for a in range(1,1+max(list_int[1:]))}
print(collect_dict)
# loop though list, ignoring first entry.
for a in list_int[1:]:
collect_dict[a].append(a)
# initialise output
out='YES'
for key, item in collect_dict.items():
# check number of items with preference for number is divisilbe
# by that number
if item: # if list has entries:
if (len(item) % key) > 0:
out='NO'
break
with open('cavalryout.txt', 'w') as f:
f.write(out)

Python Algorithm Challenge?

I have a python function (call it myFunction) that gets as input a list of numbers, and, following a complex calculation, returns back the result of the calculation (which is a number).
The function looks like this:
def myFunction( listNumbers ):
# initialize the result of the calculation
calcResult = 0
# looping through all indices, from 0 to the last one
for i in xrange(0, len(listNumbers), 1):
# some complex calculation goes here, changing the value of 'calcResult'
# let us now return the result of the calculation
return calcResult
I tested the function, and it works as expected.
Normally, myFunction is provided a listNumbers argument that contains 5,000,000 elements in it. As you may expect, the calculation takes time. I need this function to run as fast as possible
Here comes the challenge: assume that the time now is 5am, and that listNumbers contains just 4,999,999 values in it. Meaning, its LAST VALUE is not yet available. This value will only be available at 6am.
Obviously, we can do the following (1st mode): wait until 6am. Then, append the last value into listNumbers, and then, run myFunction. This solution works, BUT it will take a while before myFunction returns our calculated result (as we need to process the entire list of numbers, from the first element on). Remember, our goal is to get the results as soon as possible past 6am.
I was thinking about a more efficient way to solve this (2nd mode): since (at 5am) we have listNumbers with 4,999,999 values in it, let us immediately start running myFunction. Let us process whatever we can (remember, we don't have the last piece of data yet), and then -- exactly at 6am -- 'plug in' the new data piece -- and generate the computed result. This should be significantly faster, as most of the processing will be done BEFORE 6am, hence, we will only have to deal with the new data -- which means the computed result should be available immediately after 6am.
Let's suppose that there's no way for us to inspect the code of myFunction or modify it. Is there ANY programming technique / design idea that will allow us to take myFunction AS IS, and do something with it (without changing its code) so that we can have it operate in the 2nd mode, rather than the 1st one?
Please do not suggest using c++ / numpy + cython / parallel computing etc to solve this problem. The goal here is to see if there's any programming technique or design pattern that can be easily used to solve such problems.
You could use a generator as an input. The generator will only return when there is data available to process.
Update: thanks for the brilliant comment, I wanted to remove this entry :)
class lazylist(object):
def __init__(self):
self.cnt = 0
self.length = 5000000
def __iter__(self):
return self
def __len__(self):
return self.length
def next(self):
if self.cnt < self.length:
self.cnt += 1
#return data here or wait for it
return self.cnt #just return a counter for this example
else:
raise StopIteration()
def __getitem__(self, i):
#again, block till you have data.
return i+1 #simple counter
myFunction(lazylist())
Update: As you can see from the comments and other solutions your loop construct and len call causes a lot of headaches, if you can eliminate it you can use a lot more elegant solution. for e in li or enumerate is the pythonic way to go.
By "list of numbers", do you mean an actual built-in list type?
If not, it's simple. Python uses duck-typing, so passing any sequence that supports iteration will do. Use the yield keyword to pass a generator.
def delayed_list():
for val in numpy_array[:4999999]:
yield val
wait_until_6am()
yield numpy_array[4999999]
and then,
myFunction(delayed_list())
If yes, then it's trickier :)
Also, check out PEP8 for recommended Python code style:
no spaces around brackets
my_function instead of myFunction
for i, val in enumerate(numbers): instead of for i in xrange(0, len(listNumbers), 1): etc.
subclass list so that when the function tries to read the last value it blocks until another thread provides the value.
import threading
import time
class lastblocks(list):
def __init__(self,*args,**kwargs):
list.__init__(self,*args,**kwargs)
self.e = threading.Event()
def __getitem__(self, index):
v1 = list.__getitem__(self,index)
if index == len(self)-1:
self.e.wait()
v2 = list.__getitem__(self,index)
return v2
else:
return v1
l = lastblocks(range(5000000-1)+[None])
def reader(l):
s = 0
for i in xrange(len(l)):
s += l[i]
print s
def writer(l):
time.sleep(10)
l[5000000-1]=5000000-1
l.e.set()
print "written"
reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()
prints:
written
12499997500000
for numpy:
import threading
import time
import numpy as np
class lastblocks(np.ndarray):
def __new__(cls, arry):
obj = np.asarray(arry).view(cls)
obj.e = threading.Event()
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.e = getattr(obj, 'e', None)
def __getitem__(self, index):
v1 = np.ndarray.__getitem__(self,index)
if index == len(self)-1:
self.e.wait()
v2 = np.ndarray.__getitem__(self,index)
return v2
else:
return v1
l = lastblocks(np.asarray(range(5000000-1)+[None]))
def reader(l):
s = 0
for i in xrange(len(l)):
s += l[i]
print s
def writer(l):
time.sleep(10)
l[5000000-1]=5000000-1
l.e.set()
print "written"
reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()
Memory protection barriers are a general way to solve this type of problem when the techniques suggested in the other answers (generators and mock objects) are unavailable.
A memory barrier is a hardware feature that causes an interrupt when a program tries to access a forbidden area of memory (usually controllable at the page level). The interrupt handler can then take appropriate action, for example suspending the program until the data is ready.
So in this case you'd set up a barrier on the last page of the list, and the interrupt handler would wait until 06:00 before allowing the program to continue.
You could just create your own iterator to iterate over the 5,000,000 elements. This would do whatever you need to do to wait around for the final element (can't be specific since the example in the question is rather abstract). I'm assuming you don't care about the code hanging until 6:00, or know how to do it in a background thread.
More information about writing your own iterator is at http://docs.python.org/library/stdtypes.html#iterator-types
There is a simpler generator solution:
def fnc(lst):
result = 0
index = 0
while index < len(lst):
while index < len(lst):
... do some manipulations here ...
index += 1
yield result
lst = [1, 2, 3]
gen = fnc(lst)
print gen.next()
lst.append(4)
print gen.next()
I'm a little bit confused about not being able to investigate myFunction. At least you have to know if your list is being iterated or accessed by index. Your example might suggest an index is used. If you want to take advantage of iterators/generators, you have to iterate. I know you said myFunction is unchangeable, but just want to point out, that most pythonic version would be:
def myFunction( listNumbers ):
calcResult = 0
# enumerate if you really need an index of element in array
for n,v in enumerate(listNumbers):
# some complex calculation goes here, changing the value of 'calcResult'
return calcResult
And now you can start introducing nice ideas. One is probably wrapping list with your own type and provide __iter__ method (as a generator); you could return value if accessible, wait for more data if you expect any or return after yielding last element.
If you have to access list by index, you can use __getitem__ as in Dan D's example. It'll have a limitation though, and you'll have to know the size of array in advance.
Couldn't you simply do something like this:
processedBefore6 = myFunction([1,2,3]) # the first 4,999,999 vals.
while lastVal.notavailable:
sleep(1)
processedAfter6 = myFunction([processedBefore6, lastVal])
If the effects are linear (step 1 -> step 2 -> step 3, etc) this should allow you to do as much work as possible up front, then catch the final value when it's available and finish up.

Categories

Resources