Related
I have a list that looks like :
L = [2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1]
I want to check if the sequence 2-1-2 is always respected or I have an outlier somewhere .
Is there a simple way to do this with python ?
from itertools import cycle
L = [2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
seq = cycle([2, 1])
for idx, el in enumerate(L):
if not el == next(seq):
raise ValueError(f"Sequence not followed at index {idx}")
What does "2-1-2 is always respected" mean, precisely?
I assume you want to check if Lis an alternating sequence of 2 and 1, starting with 2.
That's easy to check:
def check(L):
if len(L) < 3:
return False
even_indices_all_two = set(L[::2]) == {2}
odd_indices_all_one = set(L[1::2]) == {1}
return even_indices_all_two and odd_indices_all_one and L[-1] == 2
If you don't require L to end with 2, remove the and L[-1] == 2.
If you wonder about the many colons, check this post to understand slicing.
I use the set to check if a sequence contains only one distinct item.
I need to sum the values from a list to a specific section of a another list.
For example:
a = [... , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
b = [3, 3, 3]
...
ab = [..., 1, 1, 1, 1, 1, 4, 4, 4, 1, 1, 1, 1, ...]
I need a fast method because it will be repeated several times in a row and it shouldn't iterate through the whole list cause it is quite long (ca. 1000 elements). The indexes where the summation should be, are known.
Thx for any kind of help!
a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
b = [3, 3, 3]
start_index = 5
for ind, _ in enumerate(b):
a[start_index + ind] += b[ind]
print(a)
[1, 1, 1, 1, 1, 4, 4, 4, 1, 1, 1, 1]
You could try something like this:
a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
b = [3, 3, 3]
def add_at_index(list1, list2, index_start):
for idx, v in enumerate(b):
a[idx + index_start] += b[idx]
return a
print(add_at_index(a, b, 4))
def merge_list(list1, list2, index):
if index + len(list2) > len(list1):
print("Invalid Index")
return None
for i in range(0, len(list2)):
list1[index + i] += list2[i]
return list1
# Main
a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
b = [3, 3, 3]
# Choose an index in the third argument of the function
print(merge_list(a, b, 5))
I have arrays like
arr1['a'] = np.array([1, 1, 1])
arr1['b'] = np.array([1, 1, 1])
arr1['c'] = np.array([1, 1, 1])
b_index = [0, 2, 5]
arr2['a'] = np.array([2, 2, 2, 2, 2, 2])
arr2['b'] = np.array([2, 2, 2, 2, 2, 2])
arr2['c'] = np.array([2, 2, 2, 2, 2, 2])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
b_index is the list of indexes.
I want to copy from arr1 to arr2 at indexes in b_index.
so the result should be something like
arr2['a'] = np.array([1, 2, 1, 2, 2, 1])
arr2['b'] = np.array([1, 2, 1, 2, 2, 1])
arr2['c'] = np.array([1, 2, 1, 2, 2, 1])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
I can obviously do using loops, but not sure if that is a right way to do that.
We are talking about 100 columns('a','b','c') and around a 1 million rows.
One solution, which might not be optimal, is to use advanced array indexing:
In [1]: arr = np.ones((5, 3))
In [2]: arr2 = np.full((5, 5), 2)
In [3]: arr2[:, [1, 2, 4]] = arr
In [4]: arr2
Out[4]:
array([[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1]])
Does it help ?
I have a numpy array like the following:
A = np.array([[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 5, 1, 1, 1],
[1, 1, 1, 1, 3, 3, 1, 1],
[1, 1, 1, 1, 1, 1, 2, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 4, 1, 1]])
I am looking for the minimum indices in each column. I found this using numpy.argmin as follows:
I = np.zeros(A.shape[1], dtype=np.int64)
for j in range(A.shape[1]):
I[j] = np.argmin(A[:, j])
This gives me I = [0, 0, 0, 0, 0, 0, 0, 0]. Since there are ties, I could obtain the following: I = [0, 1, 2, 3, 4, 0, 5, 1], where I break the ties by the index that is least used (from the previous indices).
In more details:
For j=0, we have np.argmin(A[:, 0]) in [0, 1, 2, 3, 4, 5] and, say, we choose np.argmin(A[:, 0]) = 0.
For j=1, we have np.argmin(A[:, 1]) in [0, 1, 2, 3, 4, 5] and we have to choose the minimum index from [1, 2, 3, 4, 5] since these indices are the least used (we already choose np.argmin(A[:, 0]) = 0 for j=0). Say, we choose np.argmin(A[:, 1])=1.
For j=2, we have np.argmin(A[:, 2]) in [0, 1, 2, 3, 4, 5] and we have to choose the minimum index from [2, 3, 4, 5] since these indices are the least used.
We continue in this way...
For j=5, we have np.argmin(A[:, 5]) in [0, 1, 3, 4] and we have to choose the minimum index from [0, 1, 3, 4] since these indices are the least used. Say we choose np.argmin(A[:, 5])=0.
For j=6, we have np.argmin(A[:, 6]) in [0, 1, 2, 4, 5] and we have to choose from [5] since these indices are the least used. We choose np.argmin(A[:, 6])=5.
For j=7, we have np.argmin(A[:, 7]) in [0, 1, 2, 3, 4, 5] and we have to choose from [1, 2, 3, 4, 5] since these indices are the least used. Say we choose np.argmin(A[:, 7])=1.
I hope it is clear. My question is how to find the minimum indices and break ties by the least used indices in Python?
You could use min combined with a dictionary for keeping the counts of each index:
import numpy as np
A = np.array([[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 5, 1, 1, 1],
[1, 1, 1, 1, 3, 3, 1, 1],
[1, 1, 1, 1, 1, 1, 2, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 4, 1, 1]])
counts = {}
I = np.zeros(A.shape[1], dtype=np.int64)
for j in range(A.shape[1]):
_, _, i = min([(v, counts.get(i, 0), i) for i, v in enumerate(A[:, j])])
counts[i] = counts.get(i, 0) + 1
I[j] = i
print(I)
Output
[0 1 2 3 4 0 5 1]
The idea is to create the following key: (value, count of index, index), and then use the normal comparison of tuples, so if the values are equal the one with less counts of the corresponding index will be selected, if both counts are equal get the one with lower index will be selected.
I want to create new list according cumulative sums of numbers in a list. Input is ideal - can be splitting to subset, sum of each subset is equal. Length of subset is not equal. Number of subset is input.
Each subset of output represents increment integers [0,1,2,3,...], which replace original input. Quantity of integers is number of subsets.
Example:
number of subsets = 2
input = [1, 4, 5]
#cumsum = [1, 5, 10]
subsets = [1,5], [10]
output-subsets = [0,0], [1]
output = [0, 0, 1]
Example1:
number of subsets = 4
input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1,3,6], [10, 12],[17, 18], [24]
output-subsets = [0, 0, 0], [1, 1], [2, 2], [3]
output = [0, 0, 0, 1, 1, 2, 2, 3]
number of subsets = 2
input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1, 3, 6, 10, 12],[17, 18, 24]
output-subsets = [0, 0, 0, 0, 0], [1, 1, 1]
output = [0, 0, 0, 0, 0, 1, 1, 1]
I try modified SO question:
def changelist(lis, t):
total = 0
s = sum(lis)
subset = s/t
for x in lis:
total += x
i= 1
if(total <= subset):
i = 0
yield i
#changelist([input array], number of subset)
print list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))
but only first subset is correct:
output = [0, 0, 0, 1, 1, 1, 1, 1]
I think numpy.array_split is problematic strange behaviour of numpy array_split.
I would really love any kind of explanation or help.
This should solve your problem:
def changelist (l, t):
subset = sum(l) / t
current, total = 0, 0
for x in l:
total += x
if total > subset:
current, total = current + 1, x
yield current
Examples:
>>> list(changelist([1, 4, 5], 2))
[0, 0, 1]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))
[0, 0, 0, 1, 1, 2, 2, 3]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 2))
[0, 0, 0, 0, 0, 1, 1, 1]
How does it work?
current stores the "id" of the current subset, total the sum of the current subset.
For each element x in your initial list l, you add its value to the current total, if this total is greater than the expected sum of each subset (subset in my code), then you know that you are in the next subset (current = current + 1) and you "reset" the total of the current subset to the actuel element (total = x).
You can use NumPy here after converting the input to an array for a vectorized solution, assuming N as the number of subsets, as listed here -
def modified_cumsum(input,N):
A = np.asarray(input).cumsum()
return np.append(False,np.in1d(A,(1+np.arange(N))*A[-1]/N))[:-1].cumsum()
Sample runs -
In [31]: N = 2 #number of subsets
...: input = [1, 4, 5]
...:
In [32]: modified_cumsum(input,N)
Out[32]: array([0, 0, 1])
In [33]: N = 4 #number of subsets
...: input = [1, 2, 3, 4, 2, 5, 1, 6]
...:
In [34]: modified_cumsum(input,N)
Out[34]: array([0, 0, 0, 1, 1, 2, 2, 3])
In [35]: N = 2 #number of subsets
...: input = [1, 2, 3, 4, 2, 5, 1, 6]
...:
In [36]: modified_cumsum(input,N)
Out[36]: array([0, 0, 0, 0, 0, 1, 1, 1])