Related
I have a sequence like: 4, 8, 16, 32, 64, 128, 256, 512
now I have a number
a = 5
then
b = 8
means b is the closest higher digit according to a from the sequence.
Now a = x then b = ?
I offer two solutions. The first is probably the most obvious and intuitive. The second is more advanced but more efficient.
Simple and intuitive
Here is a simple intuitive approach. The following function returns the closest number greater than or equal to the argument num in the sequence 4, 8, 16, 32, 64, .... The function first assigns n to 4. Then, so long as n is strictly less than the argument num, n is assigned the next value in the sequence and the comparison is made again. Once n is greater than or equal to num, we return n.
def seq_1(num):
"""Returns the closest number greater than or equal to num in the
sequence 4, 8, 16, 32, 64, ...
"""
n = 4
while (n < num):
n *= 2
return n
More efficient, but more advanced
A less intuitive approach but more efficient is obtained by first recognizing that the sequence is defined by
a_0 = 4;
a_n = 2 * a_(n-1) for n in {1, 2, 3, ...}.
Notice how a_(n-1) = 2 * a_(n-2). Substituting this into a_n = 2 * a_(n-1), we obtain a_n = (2 ** 2) * a_(n-2). More generally, through repeated substitutions, we obtain a_n = (2 ** n) * a_(0) or a_n = (2 ** n) * 4 or
a_n = (2 ** (n + 2)) for n in {0, 1, 2, 3, ...}
So the first element a_0 is 2 ** 2 = 4, the second element a_1 is 2 ** 3 = 8, the third is 2 ** 4 = 16 and so on.
This suggests the solution:
def seq_2(num):
"""Returns the closest number greater than or equal to num in the
sequence 4, 8, 16, 32, 64, ...
"""
if num < 4:
return 4
return 1 << (num - 1).bit_length()
if num is less than 4 we return 4.
1 << (num - 1).bit_length() evaluates to the closest power of 2 greater than or equal to num.
This requires the following knowledge:
2 is 10 in binary, 2 ** 2 is 100 in binary, 2 ** 3 is 1000 in binary, ..., 2 ** n in binary is 1 followed by n zeros.
bit_length() is a method defined for Python ints that "[r]eturn[s] the number of bits necessary to represent an integer in binary" (docs).
i << j shifts i left by j bits. For example
In [1]: bin(1)
Out[1]: '0b1' # 2 to the power of 0.
In [2]: bin(1 << 5)
Out[2]: '0b100000' # 2 to the power of 5.
Timings
# seq_1(420_000)
657 ns ± 0.55 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# seq_2(420_000)
116 ns ± 0.416 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Providing that your sequence is in ascending sorted order then:
seq = [4,8,16,32,64,128,256,512]
def get_next_highest(seq, a):
b = None
for i in range(len(seq)-1, -1, -1):
if seq[i] <= a:
break
b = seq[i]
return b
print(get_next_highest(seq, 5))
If you have a sorted sequence you can use bisect from Pythons standard library.
import bisect
data = [4, 8, 16, 32, 64, 128, 256, 512]
a = 5
index = bisect.bisect(data, a)
b = data[index]
You will have to add a boundary check if you expect a to have a value larger than the last element of the list.
This code demonstrates how it works with random values.
import bisect
import random
def get_next_highest(data, value):
try:
return data[bisect.bisect(data, value)]
except IndexError:
return None
def main():
data = sorted([random.randint(1, 100) for _ in range(10)])
a = 50
b = get_next_highest(data, a)
print(data, a, b)
if __name__ == '__main__':
main()
Problem statement:
As stated by the title, I want to remove parts from an 1D array that have consecutive zeros and length equal or above a threshold.
My solution:
I produced the solution shown in the following MRE:
import numpy as np
THRESHOLD = 4
a = np.array((1,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,1))
print("Input: " + str(a))
# Find the indices of the parts that meet threshold requirement
gaps_above_threshold_inds = np.where(np.diff(np.nonzero(a)[0]) - 1 >= THRESHOLD)[0]
# Delete these parts from array
for idx in gaps_above_threshold_inds:
a = np.delete(a, list(range(np.nonzero(a)[0][idx] + 1, np.nonzero(a)[0][idx + 1])))
print("Output: " + str(a))
Output:
Input: [1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1]
Output: [1 1 0 1 1 1 0 0 0 1 1]
Question:
Is there a less complicated and more efficient way to do this on a numpy array?
Edit:
Based on #mozway comments, I'm editing my question providing some more information.
Basically, the problem domain is:
I have 1D signals of length ~20.000 samples
Some parts of the signals have been zeroed due to noise
The rest of the signal has non-zero values, in the range ~[50, 250]
Leading and trailing zeros have been removed
My goal is to remove the zero parts above a length threshold as I have already said.
More detailed questions:
As far as numpy efficient handling is concerned, is there a better solution from the one above?
As far as efficient signal processing techniques are concerned, is there more suitable way to achieve the desired goal than using numpy?
Comments on answers:
Regarding my first concern about efficient numpy handling, #mathfux's solution is really great and basically what I was looking for. That's why I accepted this one.
However, the approach by #Jérôme Richard answers my second question and it presents a really high performance solution; really useful if the dataset is extremely big.
Thanks for your great answers!
np.delete create a new array every time it is called which is very inefficient. A faster solution is to store all the value to keep in a mask/boolean array and then filter the input array at once. However, this will still likely require a pure-Python loop if done only with Numpy. A simpler and faster solution is to use Numba (or Cython) to do that. Here is an implementation:
import numpy as np
import numba as nb
#nb.njit('int_[:](int_[:], int_)')
def filterZeros(arr, threshold):
n = len(arr)
res = np.empty(n, dtype=arr.dtype)
count = 0
j = 0
for i in range(n):
if arr[i] == 0:
count += 1
else:
if count >= threshold:
j -= count
count = 0
res[j] = arr[i]
j += 1
if n > 0 and arr[n-1] == 0 and count >= threshold:
j -= count
return res[0:j]
a = np.array((1,1,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,1))
a = filterZeros(a, 4)
print("Output: " + str(a))
Here are the result with a random binary array containing 100_000 items on my machine:
Reference implementation: 5982 ms
Mozway's solution: 23.4 ms
This implementation: 0.11 ms
Thus, the solution is about 54381 faster than the initial solution and 212 times faster than the one of Mozway. The code can even be ~30% faster by working in-place (destroy the input array) and by telling Numba the array is contiguous in memory (using ::1 instead of :).
It's also possible to find differences of nonzero items, fix the ones that exceeed threshold and reconstruct a sequence in a correct way.
def numpy_fix(a):
# STEP 1. find indices of nonzero items: [0 1 3 8 9 13 19]
idx = np.flatnonzero(a)
# STEP 2. Find differences along these indices (also insert a leading zero): [0 1 2 5 1 4 6]
df = np.diff(idx, prepend=0)
# STEP 3. Fix differences of indices larger than THRESHOLD: [0 1 2 1 1 4 1]
df[df>THRESHOLD] = 1
# STEP 4. Given differences on indices, reconstruct indices themselves: [0 1 3 4 5 9 10]
cs = np.cumsum(df)
z = np.zeros(cs[-1]+1, dtype=int) # create a list of zeros
z[cs] = 1 #pad it with ones within indices found
return z
>>> numpy_fix(a)
array([1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1])
(Note that it's correct only if a has no leading or trailing zeros)
%timeit numpy_fix(np.tile(a, (1, 50000)))
39.3 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
A quite efficient method is to use itertools.groupby+itertools.chain:
from itertools import groupby, chain
a2 = np.array(list(chain(*(l for k,g in groupby(a)
if len(l:=list(g))<THRESHOLD or k))))
output:
array([1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1])
This works relatively fast, for instance on 1 million items:
# A = np.random.randint(2, size=1000000)
%%timeit
np.array(list(chain(*(l for k,g in groupby(a)
if len(l:=list(g))<THRESHOLD or k))))
# 254 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have come across a code which uses torch.einsum to compute a tensor multiplication. I am able to understand the workings for lower order tensors, but, not for the 4D tensor as below:
import torch
a = torch.rand((3, 5, 2, 10))
b = torch.rand((3, 4, 2, 10))
c = torch.einsum('nxhd,nyhd->nhxy', [a,b])
print(c.size())
# output: torch.Size([3, 2, 5, 4])
I need help regarding:
What is the operation that has been performed here (explanation for how the matrices were multiplied/transposed etc.)?
Is torch.einsum actually beneficial in this scenario?
(Skip to the tl;dr section if you just want the breakdown of steps involved in an einsum)
I'll try to explain how einsum works step by step for this example but instead of using torch.einsum, I'll be using numpy.einsum (documentation), which does exactly the same but I am just, in general, more comfortable with it. Nonetheless, the same steps happen for torch as well.
Let's rewrite the above code in NumPy -
import numpy as np
a = np.random.random((3, 5, 2, 10))
b = np.random.random((3, 4, 2, 10))
c = np.einsum('nxhd,nyhd->nhxy', a,b)
c.shape
#(3, 2, 5, 4)
Step by step np.einsum
Einsum is composed of 3 steps: multiply, sum and transpose
Let's look at our dimensions. We have a (3, 5, 2, 10) and a (3, 4, 2, 10) that we need to bring to (3, 2, 5, 4) based on 'nxhd,nyhd->nhxy'
1. Multiply
Let's not worry about the order in which the n,x,y,h,d axes is, and just worry about the fact if you want to keep them or remove (reduce) them. Writing them down as a table and see how we can arrange our dimensions -
## Multiply ##
n x y h d
--------------------
a -> 3 5 2 10
b -> 3 4 2 10
c1 -> 3 5 4 2 10
To get the broadcasting multiplication between x and y axis to result in (x, y), we will have to add a new axis at the right places and then multiply.
a1 = a[:,:,None,:,:] #(3, 5, 1, 2, 10)
b1 = b[:,None,:,:,:] #(3, 1, 4, 2, 10)
c1 = a1*b1
c1.shape
#(3, 5, 4, 2, 10) #<-- (n, x, y, h, d)
2. Sum / Reduce
Next, we want to reduce the last axis 10. This will get us the dimensions (n,x,y,h).
## Reduce ##
n x y h d
--------------------
c1 -> 3 5 4 2 10
c2 -> 3 5 4 2
This is straightforward. Lets just do np.sum over the axis=-1
c2 = np.sum(c1, axis=-1)
c2.shape
#(3,5,4,2) #<-- (n, x, y, h)
3. Transpose
The last step is rearranging the axis using a transpose. We can use np.transpose for this. np.transpose(0,3,1,2) basically brings the 3rd axis after the 0th axis and pushes the 1st and 2nd. So, (n,x,y,h) becomes (n,h,x,y)
c3 = c2.transpose(0,3,1,2)
c3.shape
#(3,2,5,4) #<-- (n, h, x, y)
4. Final check
Let's do a final check and see if c3 is the same as the c which was generated from the np.einsum -
np.allclose(c,c3)
#True
TL;DR.
Thus, we have implemented the 'nxhd , nyhd -> nhxy' as -
input -> nxhd, nyhd
multiply -> nxyhd #broadcasting
sum -> nxyh #reduce
transpose -> nhxy
Advantage
Advantage of np.einsum over the multiple steps taken, is that you can choose the "path" that it takes to do the computation and perform multiple operations with the same function. This can be done by optimize paramter, which will optimize the contraction order of an einsum expression.
A non-exhaustive list of these operations, which can be computed by einsum, is shown below along with examples:
Trace of an array, numpy.trace.
Return a diagonal, numpy.diag.
Array axis summations, numpy.sum.
Transpositions and permutations, numpy.transpose.
Matrix multiplication and dot product, numpy.matmul numpy.dot.
Vector inner and outer products, numpy.inner numpy.outer.
Broadcasting, element-wise and scalar multiplication, numpy.multiply.
Tensor contractions, numpy.tensordot.
Chained array operations, inefficient calculation order, numpy.einsum_path.
Benchmarks
%%timeit
np.einsum('nxhd,nyhd->nhxy', a,b)
#8.03 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
np.sum(a[:,:,None,:,:]*b[:,None,:,:,:], axis=-1).transpose(0,3,1,2)
#13.7 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It shows that np.einsum does the operation faster than individual steps.
For example, if there are 5 numbers 1, 2, 3, 4, 5
I want a random result like
[[ 2, 3, 1, 4, 5]
[ 5, 1, 2, 3, 4]
[ 3, 2, 4, 5, 1]
[ 1, 4, 5, 2, 3]
[ 4, 5, 3, 1, 2]]
Ensure every number is unique in its row and column.
Is there an efficient way to do this?
I've tried to use while loops to generate one row for each iteration, but it seems not so efficient.
import numpy as np
numbers = list(range(1,6))
result = np.zeros((5,5), dtype='int32')
row_index = 0
while row_index < 5:
np.random.shuffle(numbers)
for column_index, number in enumerate(numbers):
if number in result[:, column_index]:
break
else:
result[row_index, :] = numbers
row_index += 1
Just for your information, what you are looking for is a way of generating latin squares.
As for the solution, it depends on how much random "random" is for you.
I would devise at least four main techniques, two of which have been already proposed.
Hence, I will briefly describe the other two:
loop through all possible permutations of the items and accept the first that satisfy the unicity constraint along rows
use only cyclic permutations to build subsequent rows: these are by construction satisfying the unicity constraint along rows (the cyclic transformation can be done forward or backward); for improved "randomness" the rows can be shuffled
Assuming we work with standard Python data types since I do not see a real merit in using NumPy (but results can be easily converted to np.ndarray if necessary), this would be in code (the first function is just to check that the solution is actually correct):
import random
import math
import itertools
# this only works for Iterable[Iterable]
def is_latin_rectangle(rows):
valid = True
for row in rows:
if len(set(row)) < len(row):
valid = False
if valid and rows:
for i, val in enumerate(rows[0]):
col = [row[i] for row in rows]
if len(set(col)) < len(col):
valid = False
break
return valid
def is_latin_square(rows):
return is_latin_rectangle(rows) and len(rows) == len(rows[0])
# : prepare the input
n = 9
items = list(range(1, n + 1))
# shuffle items
random.shuffle(items)
# number of permutations
print(math.factorial(n))
def latin_square1(items, shuffle=True):
result = []
for elems in itertools.permutations(items):
valid = True
for i, elem in enumerate(elems):
orthogonals = [x[i] for x in result] + [elem]
if len(set(orthogonals)) < len(orthogonals):
valid = False
break
if valid:
result.append(elems)
if shuffle:
random.shuffle(result)
return result
rows1 = latin_square1(items)
for row in rows1:
print(row)
print(is_latin_square(rows1))
def latin_square2(items, shuffle=True, forward=False):
sign = -1 if forward else 1
result = [items[sign * i:] + items[:sign * i] for i in range(len(items))]
if shuffle:
random.shuffle(result)
return result
rows2 = latin_square2(items)
for row in rows2:
print(row)
print(is_latin_square(rows2))
rows2b = latin_square2(items, False)
for row in rows2b:
print(row)
print(is_latin_square(rows2b))
For comparison, an implementation by trying random permutations and accepting valid ones (fundamentally what #hpaulj proposed) is also presented.
def latin_square3(items):
result = [list(items)]
while len(result) < len(items):
new_row = list(items)
random.shuffle(new_row)
result.append(new_row)
if not is_latin_rectangle(result):
result = result[:-1]
return result
rows3 = latin_square3(items)
for row in rows3:
print(row)
print(is_latin_square(rows3))
I did not have time (yet) to implement the other method (with backtrack Sudoku-like solutions from #ConfusedByCode).
With timings for n = 5:
%timeit latin_square1(items)
321 µs ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit latin_square2(items)
7.5 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit latin_square2(items, False)
2.21 µs ± 69.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit latin_square3(items)
2.15 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
... and for n = 9:
%timeit latin_square1(items)
895 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit latin_square2(items)
12.5 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit latin_square2(items, False)
3.55 µs ± 55.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit latin_square3(items)
The slowest run took 36.54 times longer than the fastest. This could mean that an intermediate result is being cached.
9.76 s ± 9.23 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
So, solution 1 is giving a fair deal of randomness but it is not terribly fast (and scale with O(n!)), solution 2 (and 2b) are much faster (scaling with O(n)) but not as random as solution 1. Solution 3 is very slow and the performance can vary significantly (can probably be sped up by letting the last iteration be computed instead of guessed).
Getting more technical, other efficient algorithms are discussed in:
Jacobson, M. T. and Matthews, P. (1996), Generating uniformly distributed random latin squares. J. Combin. Designs, 4: 405-437. doi:10.1002/(SICI)1520-6610(1996)4:6<405::AID-JCD3>3.0.CO;2-J
This may seem odd, but you have basically described generating a random n-dimension Sudoku puzzle. From a blog post by Daniel Beer:
The basic approach to solving a Sudoku puzzle is by a backtracking search of candidate values for each cell. The general procedure is as follows:
Generate, for each cell, a list of candidate values by starting with the set of all possible values and eliminating those which appear in the same row, column and box as the cell being examined.
Choose one empty cell. If none are available, the puzzle is solved.
If the cell has no candidate values, the puzzle is unsolvable.
For each candidate value in that cell, place the value in the cell and try to recursively solve the puzzle.
There are two optimizations which greatly improve the performance of this algorithm:
When choosing a cell, always pick the one with the fewest candidate values. This reduces the branching factor. As values are added to the grid, the number of candidates for other cells reduces too.
When analysing the candidate values for empty cells, it's much quicker to start with the analysis of the previous step and modify it by removing values along the row, column and box of the last-modified cell. This is O(N) in the size of the puzzle, whereas analysing from scratch is O(N3).
In your case an "unsolvable puzzle" is an invalid matrix. Every element in the matrix will be unique on both axis in a solvable puzzle.
I experimented with a brute-force random choice. Generate a row, and if valid, add to the accumulated lines:
def foo(n=5,maxi=200):
arr = np.random.choice(numbers,n, replace=False)[None,:]
for i in range(maxi):
row = np.random.choice(numbers,n, replace=False)[None,:]
if (arr==row).any(): continue
arr = np.concatenate((arr, row),axis=0)
if arr.shape[0]==n: break
print(i)
return arr
Some sample runs:
In [66]: print(foo())
199
[[1 5 4 2 3]
[4 1 5 3 2]
[5 3 2 1 4]
[2 4 3 5 1]]
In [67]: print(foo())
100
[[4 2 3 1 5]
[1 4 5 3 2]
[5 1 2 4 3]
[3 5 1 2 4]
[2 3 4 5 1]]
In [68]: print(foo())
57
[[1 4 5 3 2]
[2 1 3 4 5]
[3 5 4 2 1]
[5 3 2 1 4]
[4 2 1 5 3]]
In [69]: print(foo())
174
[[2 1 5 4 3]
[3 4 1 2 5]
[1 3 2 5 4]
[4 5 3 1 2]
[5 2 4 3 1]]
In [76]: print(foo())
41
[[3 4 5 1 2]
[1 5 2 3 4]
[5 2 3 4 1]
[2 1 4 5 3]
[4 3 1 2 5]]
The required number of tries varies all over the place, with some exceeding my iteration limit.
Without getting into any theory, there's going to be difference between quickly generating a 2d permutation, and generating one that is in some sense or other, maximally random. I suspect my approach is closer to this random goal than a more systematic and efficient approach (but I can't prove it).
def opFoo():
numbers = list(range(1,6))
result = np.zeros((5,5), dtype='int32')
row_index = 0; i = 0
while row_index < 5:
np.random.shuffle(numbers)
for column_index, number in enumerate(numbers):
if number in result[:, column_index]:
break
else:
result[row_index, :] = numbers
row_index += 1
i += 1
return i, result
In [125]: opFoo()
Out[125]:
(11, array([[2, 3, 1, 5, 4],
[4, 5, 1, 2, 3],
[3, 1, 2, 4, 5],
[1, 3, 5, 4, 2],
[5, 3, 4, 2, 1]]))
Mine is quite a bit slower than the OP's, but mine is correct.
This is an improvement on mine (2x faster):
def foo1(n=5,maxi=300):
numbers = np.arange(1,n+1)
np.random.shuffle(numbers)
arr = numbers.copy()[None,:]
for i in range(maxi):
np.random.shuffle(numbers)
if (arr==numbers).any(): continue
arr = np.concatenate((arr, numbers[None,:]),axis=0)
if arr.shape[0]==n: break
return arr, i
Why is translated Sudoku solver slower than original?
I found that with this translation of Java Sudoku solver, that using Python lists was faster than numpy arrays.
I may try to adapt that script to this problem - tomorrow.
EDIT: Below is an implementation of the second solution in norok2's answer.
EDIT: we can shuffle the generated square again to make it real random.
So the solve functions can be modified to:
def solve(numbers):
shuffle(numbers)
shift = randint(1, len(numbers)-1)
res = []
for r in xrange(len(numbers)):
res.append(list(numbers))
numbers = list(numbers[shift:] + numbers[0:shift])
rows = range(len(numbers))
shuffle(rows)
shuffled_res = []
for i in xrange(len(rows)):
shuffled_res.append(res[rows[i]])
return shuffled_res
EDIT: I previously misunderstand the question.
So, here's a 'quick' method which generates a 'to-some-extent' random solutions.
The basic idea is,
a, b, c
b, c, a
c, a, b
We can just move a row of data by a fixed step to form the next row. Which will qualify our restriction.
So, here's the code:
from random import shuffle, randint
def solve(numbers):
shuffle(numbers)
shift = randint(1, len(numbers)-1)
res = []
for r in xrange(len(numbers)):
res.append(list(numbers))
numbers = list(numbers[shift:] + numbers[0:shift])
return res
def check(arr):
for c in xrange(len(arr)):
col = [arr[r][c] for r in xrange(len(arr))]
if len(set(col)) != len(col):
return False
return True
if __name__ == '__main__':
from pprint import pprint
res = solve(range(5))
pprint(res)
print check(res)
This is a possible solution by itertools, if you don't insist on using numpy which I'm not familiar with:
import itertools
from random import randint
list(itertools.permutations(range(1, 6)))[randint(0, len(range(1, 6))]
# itertools returns a iterator of all possible permutations of the given list.
Can't type code from the phone, here's the pseudocode:
Create a matrix with one diamention more than tge target matrix(3 d)
Initialize the 25 elements with numbers from 1 to 5
Iterate over the 25 elements.
Choose a random value for the first element from the element list(which contains numbers 1 through 5)
Remove the randomly chosen value from all the elements in its row and column.
Repeat steps 4 and 5 for all the elements.
I'm working on the following code:
mylist = [1,2,3,4,5,6,7,8,9,10.....]
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
print(value)
Basically, I'm taking 3 items in mylist at a time, the code is bigger than that, but I'm doing a lot of things with them returning a value from it, then it takes the next 3 items from mylist and keep doing it till the end of this list.
But now I have a problem, I need to identify each iteration, but they follow a rule:
The first loop are from A, the second are from B and the third are from C.
When it reaches the third, it starts over with A, so what I'm trying to do is something like this:
mylist[0:3] are from A
mylist[3:6] are from B
mylist[6:9] are from C
mylist[9:12]are from A
mylist[12:15] are from B......
The initial idea was to implement a identifier the goes from A to C, and each iteration it jumps to the next identifier, but when it reaches C, it backs to A.
So the output seems like this:
[1,2,3] from A
[4,5,6] from B
[6,7,8] from C
[9,10,11] from A
[12,13,14] from B
[15,16,17] from C
[18,19,20] from A.....
My bad solution:
Create identifiers = [A,B,C] multiply it by the len of mylist -> identifiers = [A,B,C]*len(mylist)
So the amount of A's, B's and C's are the same of mylist numbers that it needs to identify. Then inside my for loop I add a counter that adds +1 to itself and access the index of my list.
mylist = [1,2,3,4,5,6,7,8,9,10.....]
identifier = ['A','B','C']*len(mylist)
counter = -1
for x in range(0, len(mylist), 3):
value = mylist[x:x + 3]
counter += 1
print(value, identifier[counter])
But its too ugly and not fast at all. Does anyone know a faster way to do it?
Cycle, zip, and unpack:
mylist = [1,2,3,4,5,6,7,8,9,10]
for value, iden in zip(mylist, itertools.cycle('A', 'B', 'C')):
print(value, iden)
Output:
1 A
2 B
3 C
4 A
5 B
6 C
7 A
8 B
9 C
10 A
You can always use a generator to iterate over your identifiers:
def infinite_generator(seq):
while True:
for item in seq:
yield item
Initialise the identifiers:
identifier = infinite_generator(['A', 'B', 'C'])
Then in your loop:
print(value, next(identifier))
Based on Ignacio's answer fitted for your problem.
You can first reshape your list into a list of arrays containing 3 elements:
import pandas as pd
import numpy as np
import itertools
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = np.reshape(mylist[:len(mylist)-len(mylist)%3],(-1,3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]]
Note that it works since your list contains multiple of 3 elements (so you need to drop the last elements in order to respect this condition, mylist[:len(mylist)-len(mylist)%3]) - Understanding slice notation
See UPDATE section for a reshape that fits to your question.
Then apply Ignacio's solution on the reshaped list
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
UPDATE
You can use #NedBatchelder's chunk generator to reshape you array as expected:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
mylist = [1,2,3,4,5,6,7,8,9,10]
_reshaped = list(chunks(mylist, 3))
print(_reshaped)
[[1 2 3]
[4 5 6]
[7 8 9]
[10]]
Then:
for value, iden in zip(_reshaped, itertools.cycle(('A', 'B', 'C'))):
print(value, iden)
[1 2 3] A
[4 5 6] B
[7 8 9] C
[10] A
Performances
Your solution : 1.32 ms ± 94.3 µs per loop
With a reshaped list : 1.32 ms ± 84.6 µs per loop
You notice that there is no sensitive difference in terms of performances for an equivalent result.
You could create a Generator for the slices:
grouped_items = zip(*[seq[i::3] for i in range(3)])