Python list comprehension for identifying and modifying a sequence - python

I have a method which iterates over a list of numbers, and identifies for sequences of 0, non-zero, 0 and then 'normalizes' the value inbetween to 0.
Here is my code:
for index in range(len(array)-2):
if array[index] == 0 and array[index + 1] != 0 and array[index + 2] == 0:
array[index + 1] = 0
This currently works fine, and I have further methods to detect sequences of 0, nz, nz, 0 etc.
I've been looking into list comprehensions in Python, but having trouble figuring out where to start with this particular case. Is it possible to do this using list comprehension?

From the comments and advice given, it seems that my original code is the most simplest and perhaps most efficient way of performing the process. No further answers are necessary.

You might try something like
new_array = [ 0 if (array[i-1] == array[i+1] == 0)
else array[i]
for i in range(1,len(array)-1) ]
# More readable, but far less efficient
array = array[0] + new_array + array[-1]
# More efficient, but less readable
# array[1:-1] = new_array
I've adjusted the range you iterate over to add some symmetry to the condition, and take advantage of the fact that you don't really need to check the value of array[i]; if it's 0, there's no harm in explicitly setting the new value to 0 anyway.
Still, this is not as clear as your original loop, and unnecessarily creates a brand new list rather than modifying your original list only where necessary.

Not everything should be a comprehension. If you wish to be torturous though:
def f(a):
[a.__setitem__(i + 1, 0) for i, (x, y, z) in enumerate(zip(a, a[1:], a[2:]))
if x == z == 0 and y != 0]
Then
>>> a = [1, 2, 0, 1, 0, 4]
>>> f(a)
>>> a
[1, 2, 0, 0, 0, 4]

Related

How to make a list comprehesion with enumerate while reseating a variable in python

Let's say I want to get the the number of jumps of each consecutive 1's in a binary string of say, 169, which is 10101001.
The answer then it'd be 3, 2, 2 because when the algorithm starts at the most right digit of the binary number needs to move thrice to the left to reach the next 1, twice to get the next 1 and so on.
So the algorithm should have a counter that starts at 0, increments in one while finds 0's and reset each time it reaches a 1.
I need the output to be in form of a list using list comprehesion.
This is what I got so far:
number = 169
list = []
c = 1
for i in bin(number>>1)[:1:-1]:
if i == '1':
list.append(c)
c = 0
c += 1
The algorithm indeed works but the idea is to transform it into one line code using list compreshesion. I think that there should be one way to do it using enumerate().
Just like:
n = 169
list = [i for i, c in enumerate(bin(number>>1)[:1:-1], 1) if c == '1']
The problem is that the output will be [3, 5, 7] instead of [3, 2, 2] because the i (index variable) didn't reset.
I'm looking for an asnwer that isn't just straight list[a+1] - list[a] but more elegant and efficient solution.
Here's a one-liner for this problem that's most probably not readable.
s = "10101001"
result = [p - q for p, q in zip([index for index, a in enumerate(s[::-1]) if a == "1"][1:], [index for index, b in enumerate(s[::-1]) if b == "1"][:s.count("1")-1])]
You can use groupby here:
bs = "10101001"
result = [
sum(1 for _ in g) + 1 # this can also be something like len(list(g)) + 1
for k, g in groupby(reversed(bs))
if k == "0"
]
You cannot really do this easily with just the list comprehension, because what you want cannot be expressed with just mapping/filtering (in any straightforward way I can think of), but once you have a grouping iterator, it simply becomes suming the length of "0" runs.
you can easily do this with a regex pattern
import re
out = [len(i) for i in re.findall("0*1", num)]
output
print(out)
>>> [3, 2, 2]

Which is more efficient, more code or more conditional checks?

I am doing the 7th exercise on 10 Algorithms To Solve Before your Python Coding Interview. It is about moving the zeroes of a list to the end. I thought of writing a function that moves those zeroes to the end or start based on a boolean argument.
def move_zeroes(numbers, to_start = False):
pass
The idea in my mind for the move-to-end case was this.
def move_zeroes(numbers):
for i in numbers:
if i == 0:
numbers.remove(i)
numbers.append(i)
Extending this, I came across two choices.
More code, less conditional checks
def move_zeroes(numbers, to_start = False):
if to_start:
for i in numbers:
if i == 0:
numbers.remove(i)
numbers.insert(0, i)
else:
for i in numbers:
if i == 0:
numbers.remove(i)
numbers.append(i)
Less code, more conditional checks
def move_zeroes(numbers, to_start = False):
for i in numbers:
if i == 0:
numbers.remove(i)
numbers.insert(0, i) if to_start else numbers.append(i)
Does there exist a way between these? Can I have less code and less conditional checks? Also, how does this scale for larger lengths of lists?
As a secondary question, in case this is not possible, what is the practical way to do this particular example, keeping in mind memory and extra lines? In other words, which of the two tradeoffs is better than the other?
EDIT 1: Rename list parameter to numbers on the suggestion of #aaossa.
Continually calling remove() (notwithstanding the fact that you shouldn't do that on a list that you're currently iterating over) could be very inefficient as the list has to be scanned from its first element until such time as the remove criterion has been matched.
I propose this:
alist = [1,0,0,2,0,3,0,4,0,5]
def move_zeroes(lst):
# build a list from all the non-zero elements of the input list
newlist = list(filter(None, lst))
# the difference in the lengths of the original and new list
# will be the number of zeroes that were not copied
nz = len(lst) - len(newlist)
# now extend the new list with a list of zeroes of appropriate
# length and copy into the address space of the given parameter
lst[:] = newlist + [0] * nz
move_zeroes(alist)
print(alist)
Output:
[1, 2, 3, 4, 5, 0, 0, 0, 0, 0]
list2 = [1, 0, 0, 2, 0, 3, 0, 4, 0, 5]
temp = 0
for i in range(len(list2)):
for k in range(i+1, len(list2)):
if list2[i] == 0 and list2[i] < list2[k]:
temp = list2[i]
list2[i] = list2[k]
list2[k] = temp
print(list2)

How to repeat a operation in for loop n times

The first input of following code is a single value like a=3, and second input is an array consisting of pairs of values like: B = [[1000, 1], [1000, 3], [999, 4]]. If the first value of each pair is even then I want the corresponding output value to be based on some specific calculation as shown in the code; if the first value is odd, there is another calculation as shown in code. The second value in the pair is the number of times I'd like to repeat the calculation for the pair. Calculations in the example should be repeated 1, 3 and 4 times for pairs 1, 2 and 3, respectively.
I am not sure how to repeat the calculation.
import numpy as np
a = np.array(input(), dtype=int)
B = []
for i in range(a):
b = np.array(input().split(), dtype=int)
B.append(b)
B = np.array(B)
C = []
for i in range(a):
if B[i, 0] % 2==0:
c = (B[i, 0] - 99) * 3
C.append(c)
else:
if B[i, 0] % 2 == 1:
d = (B[i, 0] - 15) * 2
C.append(d)
Ignoring the input, let's say you have an array
B = np.array([[1000, 1], [1000, 3], [999, 4]])
You can perform selective computations using a mask. The quickest way to type is probably using np.where:
C = np.where(B[:, 0] % 2, 2 * (B[:, 0] - 15), 3 * (B[:, 0] - 99))
This is a bit inefficient since it computes both values for each element. A much more efficient, but uglier, method would be to do everything in-place:
b = B[:, 0]
mask = (b % 2 == 0)
C = np.empty_like(b)
np.subtract(b, 99, out=C, where=mask)
np.subtract(b, 15, out=C, where=~mask)
np.multiply(C, 3, out=C, where=mask)
np.multiply(C, 2, out=C, where=~mask)
Numpy then provides the repeat function to do the repetition after you've computed the output values:
C = np.repeat(C, B[:, 1])
You can write the whole thing as a one-liner:
C = np.repeat(np.where(B[:, 0] % 2, 2 * (B[:, 0] - 15), 3 * (B[:, 0] - 99)), B[:, 1])
Probably a case for nested for loops. You have your outer loop, then loop through your calculations a certain number of times. So, from your example, try something like:
for i in range(a):
for x in range(B[i][1]):
if B[i][0] % 2 == 0:
B[i][0]=(B[i][0]-99)*3
else:
B[i][0]=(B[i][0]-15)*2
C.append(B[i][0]
Edited: If you want to repeat on the same value, just put it back into B and then append the value of that element of B back into C
Some other notes, not really related to your question but since the number will either be even or odd, you don't need the second if statement inside your else. If you really do need to explicitly check another condition, use Python's elif statement, short for else if which will only check the condition if none of the previous conditions were matched.

Logical iteration over numpy array

Suppose that you have two equal-sized lists. First list contains only zeros and ones, and initial value of second list is equal to some fixed number. Other values of second array depends on same-indexed values of first list. Relation between them is that, if value in first list is equal to 0, same-indexed value of second list is equal to preceding one, in all other cases, is equal to some other value. In order to clarify my question, I've written the code below with help of for loop. What is the way to solve this like problem without for loop?
Code
a = np.array([0, 1, 0, 0, 0, 1, 0, 0, 1])
b = np.zeros_like(a)
b[0] = 5
for i in range(1, a.size):
if a[i] == 0:
b[i] = b[i-1]
else:
b[i] = np.random.randint(5)
Here's a vectorized approach -
offset = int(a[0]!=0)
N = (np.count_nonzero(a!=0)) - offset # no. of rand num to be generated
rand_num = np.append(5,np.random.randint(0,5,N))
out = rand_num[(a!=0).cumsum() - offset]

Numpy broadcasting with comparison operator; cyclic iteration

I have implemented a cyclic iteration function in two ways:
def Spin1(n, N) : # n - current state, N - highest state
value = n + 1
case1 = (value > N)
case2 = (value <= N)
return case1 * 0 + case2 * value
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
These functions are identical regarding the returned results. However the second function is not broadcasting-capable for a numpy array. So to test the first function I run this:
import numpy
AR1 = numpy.zeros((3, 4), dtype = numpy.uint32)
AR1[1,2] = 5
print AR1
print Spin1(AR1,5)
Magically it works, and that is so sweet. So I see exactly what I want:
[[0 0 0 0]
[0 0 5 0]
[0 0 0 0]]
[[1 1 1 1]
[1 1 0 1]
[1 1 1 1]]
Now with the second function print Spin2(AR1,5) it fails with this error:
if value > N
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
And it's clear why, since if Array statement is nonsence. So for now I just used the first variant. But when I look at those functions I have a strong feeling that in the first function there are much more mathematical operations so I don't lose the hope that I can do something about optimising it.
Questions:
1. Is it possible to optimise the function Spin1 to do less operations or how do I use the function Spin2 in broadcasting mode (possibly without making my code too ugly)? Extra question: What would be the fastest way to do that manipulation with an array?
2. Is there some standard Python function which does the same calculation (not implicitly broadcasting-capable) and how it is correctly called - "cyclic increment" probably?
There is a numpy function for this: np.where:
In [590]: AR1
Out[590]:
array([[0, 0, 0, 0],
[0, 0, 5, 0],
[0, 0, 0, 0]], dtype=uint32)
In [591]: np.where(AR1 >= 5, 0, 1)
Out[591]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
So, you could define:
def Spin1(n, N) :
value = n + 1
return np.where(value > N, 0, value)
NumPy also provides a way to turn normal Python functions into ufuncs:
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
Spin2 = np.vectorize(Spin2)
So that you can now call Spin2 on arrays:
In [595]: Spin2(AR1, 5)
Out[595]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
However, np.vectorize mainly provides syntactic sugar. There is still a Python function call being made for each array element, which makes np.vectorized ufuncs no faster than equivalent code using Python for-loops.
Your Spin1 follows a well established pattern in array oriented languages (e.g. APL, MATLAB) for 'vectorizing' a function like Spin2. You create one or more booleans (or 0/1 arrays) to represent the various states the array elements can take, and then construct the output by multiplication and summation.
For example, to avoid divide-by-zero problems, I have used:
1/(x+(x==0))
A variation on this is to use a boolean index array to select array elements that should be changed. In this case, you want to return value, but with selected elements 'rolled over'.
def Spin3(n, N) : # n - current state, N - highest state
value = n + 1
value[value>N] = 0
return value
In this case, the indexing approach is simpler, and seems to fit the program logic better. It may be faster, but I can't guarantee that. It's good to keep both approaches in mind.
I put here some feedback as an answer, just not to mess up with the question. So I've done timing tests on various functions and it turns out that assigning by a boolean mask in this case is the fastest variant (hpaulj's answer). np.where was 1.4 times slower and np.vectorize(Spin2) was 15 times slower. Now just out of curiousity I wanted to test this with loops, so I made up this algorithm for testing:
AR1 = numpy.zeros((rows, cols), dtype = numpy.uint32)
while d <= 100:
Buf = numpy.zeros_like(AR1)
r = 0
c = 0
while (r < rows) :
while (c < cols) :
temp = AR1[r, c] + 1
if temp > 5 :
Buf[r, c] = 1
else : Buf[r, c] = temp
c += 1
r += 1
c = 0
AR1 = Buf
d += 1
I am not sure, but it seems to be very straightforward implementation of all the above mentioned functions. But it is sooo slow, almost 300 times slower. I have read similar questions on SO, but still I don't get it, WHY is it so? And what exactly is causing this slowdown. Here I have intentionally made up a buffer to avoid read-write functions on the same elements and do not do memory clean up. So what can be more simple, I am confused. Don't want to open a new question, since it was asked few times already, so probably someone will put comments or has good links clarifying this?

Categories

Resources