How can I loop over true values in Python - python

I have a boolean array in python and I want to do a calculation on the cells where the value is 'true'. Currently I am using a nested for loop to go trough all the cells to find the cells with the true values. However, running the program takes a lot of time. I was wondering wether there is a faster way to do this?
for i in range (0,latstep):
for j in range (0,lonstep):
if coastline[i,j] == True:
...
Thanks for your help!

You might consider using concurrent.map() or similar to process the array elements in parallel. Always assuming there aren't dependencies between
the elements.
Another possibility is maintaining a list of the 'true' values when you initially calculate them:
coastlineCache = []
c = foo()
coastline[i][j] = c
if (c):
coastlineCache.append(c)
// later
for (c in coastlineCache):
process_true_item(c)
If, as you've alluded to above, you need the array indices, cache them as a tuple:
coastlineCache = []
c = foo()
coastline[i][j] = c
if (c):
coastlineCache.append((i, j))
// later
for (c in coastlineCache):
process_true_item(c[0], c[1]) # e.g. (i, j)

You can use nested list comprehensions, each comprehension is slightly faster than a generic for-loop on smaller input sizes:
final_list = [[foo(b) for b in i if b] for i in coastline]

Related

Why my for jump some terms after remove ? [Python] [duplicate]

This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Closed 1 year ago.
I'm new with python language and I'm having trouble to understand why this code doesn't work as I expect.
I want to calculate and put in a tuple the primitive pythagorean triplet (a^2+b^2=c^2) for a,b,c < 100.
This is the code
n=100
#Here I calculate all the pythagorean triples and I put it in a list(I wanted to use the nested list comprehension)
d=[(ap,b,c) for ap in range(1,n+1) for b in range(ap,n+1) for c in range(b,n+1) if ap**2 + b**2 == c**2 ]
#it work
#now I wonna find the primitive one:
q=[]
for q in d: #I take each triples
#I check if it is primitive
v=2
for v in range(2,q[0]) :
if q[0]%v==0 and q[1]%v==0 and q[2]%v== 0 :
d.remove(q) #if not I remove it and exit from this cycle
break
#then I would expect that it read all the triples, but it doesn't
#it miss the triples after the one I have cancelled
Can you tell me why?
Is there another way to solve it?
Do I miss some step ?
The missing 3-tuples are not caused by the break but by modifying a list at the same time that you loop it. When you remove an element from a list, the remaining element's indexes is also modified, which can produce a loop to skip certain elements from being checked. It's never a good practice to remove elements from a list you are iterating. One usually create a copy of the list that you iterate, or use functions such as filter.
Also, you can remove v = 2. There's no need to set the value of v to 2 for every iteration when you already do so with the instruction for v in range(2,q[0])
Iterating over a copy of d
If you do list(d) then you create a clone of the list you are iterating. The original list can be modified but it won't be a problem because your loop will always iterate over the original list.
n=100
d=[(ap,b,c) for ap in range(1,n+1) for b in range(ap,n+1) for c in range(b,n+1) if ap**2 + b**2 == c**2]
q=[]
for q in list(d):
for v in range(2,q[0]) :
if q[0]%v==0 and q[1]%v==0 and q[2]%v== 0 :
d.remove(q)
break
Use the function filter
For the filter function , you need to define a function that is applied to every element of your list. The filter function uses that defined function to build a new list with the elements that pass. If the defined function returns True then that element is kept and used for the new list, if it returns False, then that element is not used. filter() returns an iterator, so you need to build the list from it.
def removePrimitives(q):
for v in range(2, q[0]):
if q[0] % v == 0 and q[1] % v == 0 and q[2] % v == 0:
return False
return True
n=100
d=[(ap,b,c) for ap in range(1,n+1) for b in range(ap,n+1) for c in range(b,n+1) if ap**2 + b**2 == c**2]
q=[]
d = list(filter(removePrimitives, d))
Bonus: debugging
When it comes to coding, no matter what language, I believe one of the first things you should learn to do is how to debug it. Python has an amazing interactive debugging module called: ipdb.
Here's the commands:
n[ext]: next line, step over
s[tep]: step into next line or function.
c[ontinue]: continue
l[ine]: show more lines
p <variable>: print value
q[uit]: quit debugging
help: show all commands
help <command>: show help
Install the package with your pip installer. Here's how you could have used it in your code to see exactly what happens when a primitive tuple is found and you break from the inner loop:
import ipdb
n=100
d=[(ap,b,c) for ap in range(1,n+1) for b in range(ap,n+1) for c in range(b,n+1) if ap**2 + b**2 == c**2 ]
q=[]
for q in d:
for v in range(2,q[0]) :
if q[0]%v==0 and q[1]%v==0 and q[2]%v== 0 :
ipdb.set_trace() # This sets the breakpoint
d.remove(q)
break
At this breakpoint you can print the variable q, and d, and see how it gets modified and what happens after the break is executed.

Efficient way to add items of list to another list with a map

I have a two lists of floats L1, L2, or lengths a, b respectively. I also have a list F, of length a, whose values are integers of the range [-1,b-1]. I want to update L2 in the following way:
for i in filter(lambda x: F[x]+1, range(len(F))):
L2[F[i]] += L1[i]
Basically, F is a function of L1's index. For each index, i, of L1, if F[i] = -1, we do nothing, otherwise, we take L1's i-th item and add it to L2's F[i]-th item.
I am doing this in a program where the lengths of a and b will grow exponentially as I make my results more accurate. (also, F is roughly 50% -1's) I realize this already takes linear time, but I was wondering if there was some way to improve the constant faster, possibly through list/sum comprehension? Or, if I will need to know the contents of L2 after multiple updates, is there a practical way to store these updates, and do them all at once in a faster manner?
What about the case where I have two lists of lists LL1, LL2, each containing c lists of lengths a, and b respectively, with just one list/map F? If I want LL1[i] to update LL2[i] for all i in [0,c-1], is there a smart way to do this, or is there nothing better than doing each i one by one?
Clarification: converting to numpy structures is completely acceptable, I just lack prior-knowledge about how to utilize numpy efficiently.
Your code is fairly efficient as is. The only improvement that can be made as far as I can see comes from avoiding using a lambda function, which increases overhead to call the function per iteration. Instead, you can use the enumerate function to generate indices and values of F to iterate over, and filter the value of F with a simple if statement:
for i, j in enumerate(F):
if j != -1:
L2[j] += L1[i]

Test if all N variables are different

I want to make a condition where all selected variables are not equal.
My solution so far is to compare every pair which doesn't scale well:
if A!=B and A!=C and B!=C:
I want to do the same check for multiple variables, say five or more, and it gets quite confusing with that many. What can I do to make it simpler?
Create a set and check whether the number of elements in the set is the same as the number of variables in the list that you passed into it:
>>> variables = [a, b, c, d, e]
>>> if len(set(variables)) == len(variables):
... print("All variables are different")
A set doesn't have duplicate elements so if you create a set and it has the same number of elements as the number of elements in the original list then you know all elements are different from each other.
If you can hash your variables (and, uh, your variables have a meaningful __hash__), use a set.
def check_all_unique(li):
unique = set()
for i in li:
if i in unique: return False #hey I've seen you before...
unique.add(i)
return True #nope, saw no one twice.
O(n) worst case. (And yes, I'm aware that you can also len(li) == len(set(li)), but this variant returns early if a match is found)
If you can't hash your values (for whatever reason) but can meaningfully compare them:
def check_all_unique(li):
li.sort()
for i in range(1,len(li)):
if li[i-1] == li[i]: return False
return True
O(nlogn), because sorting. Basically, sort everything, and compare pairwise. If two things are equal, they should have sorted next to each other. (If, for some reason, your __cmp__ doesn't sort things that are the same next to each other, 1. wut and 2. please continue to the next method.)
And if ne is the only operator you have....
import operator
import itertools
li = #a list containing all the variables I must check
if all(operator.ne(*i) for i in itertools.combinations(li,2)):
#do something
I'm basically using itertools.combinations to pair off all the variables, and then using operator.ne to check for not-equalness. This has a worst-case time complexity of O(n^2), although it should still short-circuit (because generators, and all is lazy). If you are absolutely sure that ne and eq are opposites, you can use operator.eq and any instead.
Addendum: Vincent wrote a much more readable version of the itertools variant that looks like
import itertools
lst = #a list containing all the variables I must check
if all(a!=b for a,b in itertools.combinations(lst,2)):
#do something
Addendum 2: Uh, for sufficiently large datasets, the sorting variant should possibly use heapq. Still would be O(nlogn) worst case, but O(n) best case. It'd be something like
import heapq
def check_all_unique(li):
heapq.heapify(li) #O(n), compared to sorting's O(nlogn)
prev = heapq.heappop(li)
for _ in range(len(li)): #O(n)
current = heapq.heappop(li) #O(logn)
if current == prev: return False
prev = current
return True
Put the values into a container type. Then just loop trough the container, comparing each value. It would take about O(n^2).
pseudo code:
a[0] = A; a[1] = B ... a[n];
for i = 0 to n do
for j = i + 1 to n do
if a[i] == a[j]
condition failed
You can enumerate a list and check that all values are the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if all(a.index(v) == i for i, v in enumerate(a)):
print "all elements are unique"
This allows for short-circuiting once the first duplicate is detected due to the behaviour of Python's all() function.
Or equivalently, enumerate a list and check if there are any values which are not the first occurrence of that value in the list:
a = [5, 15, 20, 65, 48]
if not any(a.index(v) != i for i, v in enumerate(a)):
print "all elements are unique"

Startswith for lists in python?

Is there an equivalent of starts with for lists in python ?
I would like to know if a list a starts with a list b. like
len(a) >= len(b) and a[:len(b)] == b ?
You can just write a[:len(b)] == b
if len(b) > len(a), no error will be raised.
For large lists, this will be more efficient:
from itertools import izip
...
result = all(x==y for (x, y) in izip(a, b))
For small lists, your code is fine. The length check can be omitted, as DavidK said, but it would not make a big difference.
PS: No, there's no build-in to check if a list starts with another list, but as you already know, it's trivial to write such a function yourself.
it does not get much simpler than what you have (and the check on the lengths is not even needed)...
for an overview of more extended/elegant options for finding sublists in lists, you can check out the main answer to this
post : elegant find sub-list in list

python list generation/saving bug

I am trying to make program that prints all the possible combinations for a to zzz. I tried to add a save state feature, and it works fine but there is this bug.
Let's say I interrupted the program when it printed something like e. When I execute the program again, it works fine until z but after z instead of printing aa it prints ba and continues from ba. This happens right after it prints zz too. it prints baa instead of aaa. How can I fix this?
Here is what I did so far:
import pickle,os,time
alphabet="abcdefghijklmnopqrstuvwxyz"
try:
if os.path.isfile("save.pickle")==True:
with open("save.pickle","rb") as f:
tryn=pickle.load(f)
for i in range(3):
a=[x for x in alphabet]
for j in range(i):
a=[x+i for x in alphabet for i in a]
b=a[tryn:]
for k in b:
print(k)
time.sleep(0.01)
tryn+=1
else:
tryn=0
for i in range(3):
a=[x for x in alphabet]
for j in range(i):
a=[x+i for x in alphabet for i in a]
for k in a:
print(k)
tryn+=1
time.sleep(0.01)
except KeyboardInterrupt:
with open("save.pickle","wb") as f:
pickle.dump(tryn,f)
If you're using python2, or python3 as the tag suggests, this exists in the standard library already. See itertools, product py2, and product py3, for a simple way to solve this problem.
for i in range(3):
a=[x for x in alphabet]
for j in range(i):
a=[x+i for x in alphabet for i in a]
b=a[tryn:]
Here's your bug. You skip the first tryn strings of every length, rather than just the first tryn strings. This would be easier to recognize in the output if it weren't for the following:
for k in b:
print(k)
time.sleep(0.01)
tryn+=1
You modify tryn, the number of things you're skipping. When you print out length-2 strings, you skip a number of them equal to the number of length-1 strings. When you print out length-3 strings, you skip a number of them equal to the number of length-2 strings. If tryn were bigger than the number of length-1 strings, you would skip even more.
your problem is almost certainly here:
a=[x for x in alphabet]
for j in range(i):
a=[x+i for x in alphabet for i in a]
Perhaps you shouldn't assign the in-loop value to a, but instead use a different name? Otherwise, you are changing what you use every time through the loop....
Edit: More detail. So, technically user2357112's answer is more correct, but I'm amending mine. The initial answer was just from a quick reading, so the other answer is close to the original intent. But, the original version is inefficient (for more reasons than not using product :), since you are generating the inner loops more than once. So let's walk through why this is a bad idea, as an educational exercise:
Initial algorithm:
for i in range(n):
assign a to alphabet
for j in range(i):
i times, we rewrite a to be all combinations of the current set against the alphabet.
Note that for this algorithm, to generate the length(n) product, we have to generate all previous products length(n-1), length(n-2), ..., length(1). But you aren't saving those.
You'd be better off doing something like this:
sum_list = alphabet[:]
#get a copy
product_list = alphabet[:]
#Are we starting at 0, or 1? In any case, skip the first, since we preloaded it
for i in range(1, n):
# Your existing list comprehension was equivalent here, and could still be used
# it MIGHT be faster to do '%s%s'%(x,y) instead of x+y... but maybe not
# with these short strings
# This comprehension takes the result of the last iteration, and makes the next iteration
product_list = [x+y for x,y in product(product_list, alphabet)]
# So product list is JUST the list for range (n) - i.e. if we are on loop 2, this
# is aaa...zzz. But you want all lengths together. So, as you go, add these
# sublists to a main list.
sum_list.extend(product_list)
Overall, you are doing a lot less work.
Couple other things:
You're using i as a loop variable, then re-using it in the loop comprehension. This is conflicting, and probably not working the way you'd expect.
If this is to learn how to write save/restore type apps... it's not a good one. Note that the restore function is re-calculating every value to be able to get back where it left off - if you could rewrite this algorithm to write more information out to the file (such as the current value of product_list) and make it more generator-like, then it will actually work more like a real-world example.
Here is how I would suggest solving this problem in Python. I didn't implement the save state feature; this sequence is not a really long one and your computer should be able to produce this sequence pretty fast, so I don't think it is worth the effort to try to make it cleanly interruptable.
import itertools as it
def seq(alphabet, length):
for c in range(1, length+1):
for p in it.product(alphabet, repeat=c):
yield ''.join(p)
alphabet="abcdefghijklmnopqrstuvwxyz"
for x in seq(alphabet, 3):
print(x)
If you really wanted to, you could make a one-liner using itertools. I think this is too hard to read and understand; I prefer the above version. But this does work and will be somewhat faster, due to the use of itertools.chain and itertools.imap() rather than a Python for loops.
import itertools as it
def seq(alphabet, length):
return it.imap(''.join, it.chain.from_iterable(it.product(alphabet, repeat=c) for c in range(1, length+1)))
alphabet="abcdefghijklmnopqrstuvwxyz"
for x in seq(alphabet, 3):
print(x)
In Python 3.x you could just use map() rather than itertools.imap().

Categories

Resources