I have the following list:
indices_to_remove: [0,1,2,3,..,600,800,801,802,....,1200,1600,1601,1602,...,1800]
I have basically 3 subsets of consecutive indices:
0-600
800-1200
1600-1800
I would like to create 3 different small lists that will include only consecutive numbers.
Expected outcome:
indices_to_remove_1 : [0,1,2,3,....,600]
indices_to_remove_2 : [800,801,802,....,1200]
indices_to_remove_3 : [1600,1601,1602,....., 1800]
P.S: The numbers are arbitrary and random; moreover, I may encounter more than 3 subsets or less.
Another way is using more_itertools.consecutive_groups:
(used #Stephen's list for an example):
import more_itertools as mit
for group in mit.consecutive_groups(indices_to_remove):
print(list(group))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
[160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170]
I like to use generators for this sort of problem. You can do this like:
Split Non-Consecutive Data:
def split_non_consequtive(data):
data = iter(data)
val = next(data)
chunk = []
try:
while True:
chunk.append(val)
val = next(data)
if val != chunk[-1] + 1:
yield chunk
chunk = []
except StopIteration:
if chunk:
yield chunk
Test Code:
indices_to_remove = (
list(range(0, 11)) +
list(range(80, 91)) +
list(range(160, 171))
)
for i in split_non_consequtive(indices_to_remove):
print(i)
Results:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
[160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170]
Without making it complicated, you can simply solve it something like this :
def chunk_lists_(data_):
consecutive_list = []
for chunks in range(len(data_)):
try:
#check consecutiveness
if data_[chunks + 1] - data_[chunks] == 1:
#check if it's already in list
if data_[chunks] not in consecutive_list:
consecutive_list.append(data_[chunks])
#add last one too
consecutive_list.append(data_[chunks + 1])
else:
#yield here and empty list
yield consecutive_list
consecutive_list = []
except Exception:
pass
yield consecutive_list
Test:
#Stephen's list
print(list(chunk_lists_(list(range(0, 11)) +
list(range(80, 91)) +
list(range(160, 171)))))
output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170]]
Related
I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.
df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]
Just I want to create another dataframe based on re_ind position, ex:
first_df = df[0:85]
another_df = [85:512]
last_df = [512:]
and each dataframe I want to check one condition
count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
count += 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
count += 1
How can I do that using for loop with creating new data frame using existing dataframe?
From sample data for groups created by df['diff_in_min'] >= 300) add cumulative sum, then aggregate sum, compare for another condition and count Trues by sum:
s = (df['diff_in_min'] >= 300).cumsum()
out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2
jezrael's answer is much better and more succinct. However, in keeping with your style of programming, here is another way you could tackle it:
import pandas as pd
df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
df_list = []
df_list.append(df[df['index']<85])
df_list.append(df[(df['index']>=85) & (df['index'] <512)])
df_list.append(df[df['index']>=512])
count = 0
for temp_df in df_list:
if temp_df['diff_in_min'].sum() > 500:
count += 1
print(f"Count = {count}")
OUTPUT:
Count = 2
Which is exactly what jezrael got, and why my vote goes to them.
Given an example list a = [311, 7426, 3539, 2077, 13, 558, 288, 176, 6, 196, 91, 54, 5, 202, 116, 95] with n = 16 elements (it will be in general a list of an even number of elements).
I wish to create n/4 lists that would be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
(The solution is not taking an element every n such as a[i::3] in a for loop because values are excluded as the sliding window moves to the left)
Thanks for the tips!
UPDATE:
Thanks for the solutions which work well for this particular example. I realized however that my problem is a bit more complex than this.
In the sense that the list a is generated dynamically in the sense the list can decrease or increase. Now my issue is the following, say that the list grows of another group i.e. until 20 elements. Now the output lists should be 5 using the same concept. Example:
a = [311, 7426, 3539, 2077, 1 ,13, 558, 288, 176, 1, 6, 196, 91, 54, 1, 5, 202, 116, 95, 1]
Now the output should be:
list1 = [311, 13, 6, 5]
list2 = [7426, 558, 196, 202]
list3 = [3539, 288, 91, 116]
list4 = [2077, 176, 54, 95]
list5 = [1, 1, 1, 1]
And so on for whatever size of the list.
Thanks again!
I'm assuming the length of the list a is a multiple of 4. You can use numpy for your problem.
import numpy as np
a = [...]
desired_shape = (-1, len(a)//4)
arr = np.array(a).reshape(desired_shape).transpose().tolist()
Output:
[[311, 13, 6, 5],
[7426, 558, 196, 202],
[3539, 288, 91, 116],
[2077, 176, 54, 95],
[1, 1, 1, 1]]
Unpack the list into variables or iterate over them as desirable.
Consult numpy.transpose, and reshape to understand their usage.
One option: nested list comprehension.
split in n/4 chunks of 4 items
out = [[a[i+4*j] for j in range(4)]
for i in range(len(a)//4)]
Output:
[[311, 1, 176, 91],
[7426, 13, 1, 54],
[3539, 558, 6, 1],
[2077, 288, 196, 5],
[1, 176, 91, 202]]
split in 4 chunks of n/4 items
out = [[a[i+4*j] for j in range(len(a)//4)]
for i in range(4)]
Output:
[[311, 1, 176, 91, 202],
[7426, 13, 1, 54, 116],
[3539, 558, 6, 1, 95],
[2077, 288, 196, 5, 1]]
To split in lists:
list1, list2, list3, list4 = out
Although it is not easily possible to do this programmatically (and not recommended to use many variables)
guys. I am now working on a python algorithm and I am new to python. I'd like to generate a list of numbers like 4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25... with 2 for loops.
I've done some work to find some numbers and I am close to the result I want, which is generate a list contains this numbers
My code is here:
for x in range(0, 6, 1):
start_ind = int(((x+3) * (x+2)) / 2 + 1)
print("start index is ", [start_ind], x)
start_node = node[start_ind]
for y in range(0, x):
ind = start_ind + y + 1
ind_list = node[ind]
index = [ind_list]
print(index)
Node is a list:
node = ['n%d' % i for i in range(0, 36, 1)]
What I received from this code is:
start index is [7] 1
['n8']
start index is [11] 2
['n12']
['n13']
start index is [16] 3
['n17']
['n18']
['n19']
start index is [22] 4
['n23']
['n24']
['n25']
['n26']
start index is [29] 5
['n30']
['n31']
['n32']
['n33']
['n34']
This seems to give the same list: and I think it's much clearer what's happening!
val=4
result=[]
for i in range(1,7):
for j in range(val,val+i):
val = val+1
result.append(j)
val = j+3
print(result)
Do not think you need a loop for this, let alone two:
import numpy as np
dif = np.ones(100, dtype = np.int32)
dif[np.cumsum(np.arange(14))] = 3
(1+np.cumsum(dif)).tolist()
output
[4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 46, 47, 48, 49, 50, 51, 52, 53, 56, 57, 58, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 121, 122, 123, 124, 125, 126, 127, 128, 129]
ind_list = []
start_ind = 4
for x in range(0, 6):
ind_list.append(start_ind)
for y in range(1, x+1):
ind_list.append(start_ind + y)
start_ind = ind_list[len(ind_list)-1]+3
print(ind_list)
You could probably use this. the print function works fine, the list I assume works fairly well for the numbers provided. It appends the new number at the beginning of the loop, with a cotinually longer loop each time for x. I'm assuming the number sequence is 4, 4+3, 4+3+1, 4+3+1+3, 4+3+1+3+1, 4+3+1+3+1+1, 4+3+1+3+1+1+3, ....
I want to create an algorithm that find all values that can be created with the 4 basic operations + - * / from a list of number n, where 2 <= len(l) <= 6 and n >= 1
All numbers must be integers.
I have seen a lot of similar topics but I don't want to use the itertool method, I want to understand why my recursive program doesn't work
I tried to make a costly recursive program that makes an exhaustive search of all the possible combinations, like a tree with n=len(l) start and each tree depth is n.
L list of the starting number
C the current value
M the list of all possible values
My code:
def result(L,C,M):
if len(L)>0:
for i in range(len(L)) :
a=L[i]
if C>=a:
l=deepcopy(L)
l.remove(a)
m=[] # new current values
#+
m.append(C+a)
# * 1 is useless
if C !=1 or a !=1:
m.append(C*a)
# must be integer
if C%a==0 and a<=C: # a can't be ==0
m.append(C//a)
#0 is useless
if C!=a:
m.append(C-a)
for r in m: #update all values possible
if r not in M:
M.append(r)
for r in m: # call the fucntion again with new current values,and updated list of remaining number
result(l,r,M)
def values_possible(L) :
m=[]
for i in L:
l=deepcopy(L)
l.remove(i)
result(l,i,m)
m.sort()
return m
For small lists without duplicate numbers, my algorithm seems to work but with lists like [1,1,2,2,4,5] it misses some values.
It returns:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 94, 95, 96, 97, 98, 99, 100, 101,
102, 104, 105, 110, 112, 115, 116, 118, 119, 120, 121, 122, 124, 125, 128, 130,
140, 160]
but it misses 93,108,114,117,123,126,132,135,150,180.
Let's take an even simpler example: [1, 1, 2, 2].
One of the numbers your algorithm can't find is 9 = (1 + 2) * (1 + 2).
Your algorithm simply cannot come up with this computation because it always deals with a "current" value C. You can start with C = 1 + 2, but you cannot find the next 1 + 2 because it has to be constructed separately.
So your recursion will have to do at least some kind of partitioning into two groups, finding all the answers for those and then doing combining them.
Something like this could work:
def partitions(L):
if not L:
yield ([], [])
else:
for l, r in partitions(L[1:]):
yield [L[0]] + l, r
yield l, [L[0]] + r
def values_possible(L):
if len(L) == 1:
return L
results = set()
for a, b in partitions(L):
if not a or not b:
continue
for va in values_possible(a):
for vb in values_possible(b):
results.add(va + vb)
results.add(va * vb)
if va > vb:
results.add(va - vb)
if va % vb == 0:
results.add(va // vb)
return results
Not too efficient though.
Consider a numpy array of the form:
> a = np.random.uniform(0., 100., (10, 1000))
and a list of indexes to elements in that array that I want to keep track of:
> idx_s = [0, 5, 7, 9, 12, 17, 19, 32, 33, 35, 36, 39, 40, 41, 42, 45, 47, 51, 53, 57, 59, 60, 61, 62, 63, 65, 66, 70, 71, 73, 75, 81, 83, 85, 87, 88, 89, 90, 91, 93, 94, 96, 98, 100, 106, 107, 108, 118, 119, 121, 124, 126, 127, 128, 129, 133, 135, 138, 142, 143, 144, 146, 147, 150]
I also have a list of indexes of elements I need to remove from a:
> idx_d = [4, 12, 18, 20, 21, 22, 26, 28, 29, 31, 37, 43, 48, 54, 58, 74, 80, 86, 99, 109, 110, 113, 117, 134, 139, 140, 141, 148, 154, 156, 160, 166, 169, 175, 183, 194, 198, 199, 219, 220, 237, 239, 241, 250]
which I delete with:
> a_d = np.delete(arr, idx_d, axis=1)
But this process alters the indexes of elements in a_d. The indexes in idx_s no longer point in a_d to the same elements in a, since np.delete() moved them. For example: if I delete the element of index 4 from a, then all indexes after 4 in idx_s are now displaced by 1 to the right in a_d.
v Index 5 points to 'f' in a
0 1 2 3 4 5 6
a -> a b c d e f g ... # Remove 4th element 'e' from a
a_d -> a b c d f g h ... # Now index 5 no longer points to 'f' in a_d, but to 'g'
0 1 2 3 4 5 6
How do I update the idx_s list of indexes, so that the same elements that were pointed in a are pointed in a_d?
In the case of an element that is present in idx_s that is also present in idx_d (and thus removed from a and not present in a_d) its index should also be discarded.
You could use np.searchsorted to get the shifts for each element in idx_s and then simply subtract those from idx_s for the new shifted-down values, like so -
idx_s - np.searchsorted(idx_d, idx_s)
If idx_d is not already sorted, we need to feed in a sorted version. Thus, for simplicity assuming these as arrays, we would have -
idx_s = idx_s[~np.in1d(idx_s, idx_d)]
out = idx_s - np.searchsorted(np.sort(idx_d), idx_s)
A sample run to help out getting a better picture -
In [530]: idx_s
Out[530]: array([19, 5, 17, 9, 12, 7, 0])
In [531]: idx_d
Out[531]: array([12, 4, 18])
In [532]: idx_s = idx_s[~np.in1d(idx_s, idx_d)] # Remove matching ones
In [533]: idx_s
Out[533]: array([19, 5, 17, 9, 7, 0])
In [534]: idx_s - np.searchsorted(np.sort(idx_d), idx_s) # Updated idx_s
Out[534]: array([16, 4, 15, 8, 6, 0])
idx_s = [0, 5, 7, 9, 12, 17, 19]
idx_d = [4, 12, 18]
def worker(a, v, i=0):
if not a:
return []
elif not v:
return []
elif a[0] == v[0]:
return worker(a[1:], v[1:], i+1)
elif a[0] < v[0]:
return [a[0]-i] + worker(a[1:], v, i)
else:
return [a[0]-i-1] + worker(a[1:], v[1:], i+1)
worker(idx_s, idx_d)
# [0, 5, 6, 8, 15, 16]