Let's say I have an array (or even a list) that looks like:
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
And then I have another ray that are distance values:
dist_data = [ 15.625 46.875 78.125 109.375 140.625 171.875 203.125 234.375 265.625 296.875]
Now, say I want to create a threshold of distance that I would like to perform an operation on from tmp_data. For this example, let's just take the max value. And let's set the threshold distance to 100. What I would like to do is take the n number of elements every 100 distance units and replace all elements in that with the maximum value in that small array. For example: I would want the final output to be
max_tmp_data_100 = [2,2,2,5,5,5,8,8,8,9]
This is because the first 3 elements in dist_data are below 100, so we take the first three elements of tmp_data (0,1,2), and get the maximum of this and replace all elements in there with that value, 2
Then, the next set of data that would be below the next 100 value would be
tmp_dist_array_100 = [109.375 140.625 171.875]
tmp_data_100 = [3,4,5]
max_tmp_data_100 = [5,5,5]
(append to [2,2,2])
I have come up with the following:
# Initialize
final_array = []
d_array = []
idx = 1
for i in range(0,10):
if dist_data[i] < idx * final_res:
d_array.append(tmp_data[i])
elif dist_data[i] > idx * final_res:
# Now get the values
max_val = np.amax(d_array)
new_array = np.ones(len(d_array)) * max_val
final_array.extend(new_array)
idx = idx + 1
But the outcome is
[2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0]
When it should be [2,2,2,5,5,5,8,8,8,9]
With numpy:
import numpy as np
cdist_data = [15.625, 46.875, 78.125, 109.375, 140.625, 171.875, 203.125, 234.375,265.625, 296.875]
cut = 100
a = np.array(dist_data)
vals = np.searchsorted(a, np.r_[cut:a.max() + cut:cut]) - 1
print(vals[(a/cut).astype(int)])
It gives:
[2 2 2 5 5 5 9 9 9 9]
You can do with groupby
from itertools import groupby
dist_data = [ 15.625, 46.875 ,78.125 ,109.375 ,140.625 ,171.875 ,203.125 ,234.375, 265.625 ,296.875]
tmp_data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
result = []
index_list = [[dist_data.index(i) for i in l]
for k, l in groupby(dist_data, key=lambda x:x//100)]
for i in tmp_data:
for lst in index_list:
if i in lst:
result.append(max(lst))
print(result)
# [2, 2, 2, 5, 5, 5, 9, 9, 9, 9]
A per your requirements last 4 elements will comes under next threshold value, the max of last 4 element is 9.
Related
I have four given variables:
group size
total of groups
partial sum
1-D tensor
and I want to add zeros when the sum within a group reached the partial sum. For example:
groupsize = 4
totalgroups = 3
partialsum = 15
d1tensor = torch.tensor([ 3, 12, 5, 5, 5, 4, 11])
The expected result is:
[ 3, 12, 0, 0, 5, 5, 5, 0, 4, 11, 0, 0]
I have no clue how can I achieve that in pure pytorch. In python it would be something like this:
target = [0]*(groupsize*totalgroups)
cursor = 0
current_count = 0
d1tensor = [ 3, 12, 5, 5, 5, 4, 11]
for idx, ele in enumerate(target):
subgroup_start = (idx//groupsize) *groupsize
subgroup_end = subgroup_start + groupsize
if sum(target[subgroup_start:subgroup_end]) < partialsum:
target[idx] = d1tensor[cursor]
cursor +=1
Can anyone help me with that? I have already googled it but couldn't find anything.
Some logic, Numpy and list comprehensions are sufficient here.
I will break it down step by step, you can make it slimmer and prettier afterwards:
import numpy as np
my_val = 15
block_size = 4
total_groups = 3
d1 = [3, 12, 5, 5, 5, 4, 11]
d2 = np.cumsum(d1)
d3 = d2 % my_val == 0 #find where sum of elements is 15 or multiple
split_points= [i+1 for i, x in enumerate(d3) if x] # find index where cumsum == my_val
#### Option 1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays[:total_groups]) #put them together, discard extra group if present
#### Option 2
split_points = [el for el in split_points if el <len(d1)] #make sure we are not splitting on the last element of d1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays)
I have a list containing some values. I want to calculate the sum of every 5 elements and then divide it by 5 and then store it in an empty list. While doing so I am not sure if I can iterate over a list the way I am doing. Being a newbie to python, any help would be much appreciated.
My list looks like this:
My code is:
a = []
i = np.arange(0,125,5)
j = np.arange(5,130,5)
for q,r in i,j:
cov = (np.sum(l[q:r]))/5
cov.append(a)
print(a)
I am getting the following error:
Instead of np.sum([i:i=+5])/5 you can use np.average().
instead of two value you can use range(0,length,5).
Try this:
a = []
for r in range(0,len(l),5):
try:
cov = (np.average(l[r:r+5]))
except IndexError:
cov = (np.average(l[r:]))
a.append(cov)
print(a)
If numpy is not a hard requirement I'd definitely do it with something simple like this:
values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
values_avg = []
temp_sum = 0
for i in range(len(values)):
temp_sum += values[i]
if (i + 1) % 5 == 0:
values_avg.append(temp_sum / 5)
temp_sum = 0
print(values_avg)
# [3.0, 8.0, 8.0, 3.0]
I am trying to achieve functionality. It's working should be this way:
It takes two lists.
Mark some indexes, preferably center few.
Both parents switches marked indexes.
Other indexes go sequentially to their parent element.
If the same element is already present in that parent, it maps and check where other parent same element was and goes there.
import random
def pm(indA, indB):
size = min(len(indA), len(indB))
c1, c2 = [0] * size, [0] * size
# Initialize the position of each indices in the individuals
for i in range(1,size):
c1[indA[i]] = i
c2[indB[i]] = i
crosspoint1 = random.randint(0, size)
crosspoint2 = random.randint(0, size - 1)
if crosspoint2 >= crosspoint1:
crosspoint2 += 1
else: # Swap the two cx points
crosspoint1, crosspointt2 = crosspoint2, crosspoint1
for i in range(crosspoint1, crosspoint2):
# Keep track of the selected values
temp1 = indA[i]
temp2 = indB[i]
# Swap the matched value
indA[i], indA[c1[temp2]] = temp2, temp1
indB[i], indB[c2[temp1]] = temp1, temp2
# Position bookkeeping
c1[temp1], c1[temp2] = c1[temp2], c1[temp1]
c2[temp1], c2[temp2] = c2[temp2], c2[temp1]
return indA, indB
a,b = pm([3, 4, 8, 2, 7, 1, 6, 5],[4, 2, 5, 1, 6, 8, 3, 7])
Error:
in pm
c1[indA[i]] = i
IndexError: list assignment index out of range
Not sure whether there are other errors in your code (I didn't run it), but here's the explanation for this one. In Python (as most of other languages), lists (sequences to be more precise) index is 0 based:
>>> l = [1, 2, 3, 4, 5, 6]
>>>
>>> for e in l:
... print(e, l.index(e))
...
1 0
2 1
3 2
4 3
5 4
6 5
>>>
>>> l[0]
1
>>> l[5]
6
>>> l[6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
To summarize your problem:
Your indA and indB lists have each 6 elements ([1..6]), and their indexes: [0..5]
Your c1 and c2 lists also have 6 elements (indexes also [0..5])
But, your using values from #1. as indexes in lists from #2., and the value 6 is a problem, as there's no such index
To fix your problem, you should use valid index values. Either:
Have the proper values in indA and indB (this is the one I'd chose):
a, b = pmxCrossover([0, 3, 1, 2, 5, 4], [4, 0, 2, 3, 5, 1])
Subtract 1, wherever you encounter values from indA or indB used as indexes:
c1[indA[i] - 1] = i
As a general advice: whenever you encounter errors, add print statements before the faulty line (printing (partial) stuff from it), and that might give you clues that could lead to solving the problem yourself.
#EDIT0
Posting (a slightly modified version of) the original code, with the index conversion:
Before the algorithm: subtract 1 (from each element) to have valid indexes
After the algorithm: add 1 to come back to 1 based indexes
code00.py:
#!/usr/bin/env python3
import sys
import random
def pmx_crossover(ind_a, ind_b):
size = min(len(ind_a), len(ind_b))
c1, c2 = [0] * size, [0] * size
# Initialize the position of each indices in the individuals
for i in range(1, size):
c1[ind_a[i]] = i
c2[ind_b[i]] = i
# Choose crossover points
crosspoint1 = random.randint(0, size)
crosspoint2 = random.randint(0, size - 1)
if crosspoint2 >= crosspoint1:
crosspoint2 += 1
else: # Swap the two cx points
crosspoint1, crosspointt2 = crosspoint2, crosspoint1
# Apply crossover between cx points
for i in range(crosspoint1, crosspoint2):
# Keep track of the selected values
temp1 = ind_a[i]
temp2 = ind_b[i]
# Swap the matched value
ind_a[i], ind_a[c1[temp2]] = temp2, temp1
ind_b[i], ind_b[c2[temp1]] = temp1, temp2
# Position bookkeeping
c1[temp1], c1[temp2] = c1[temp2], c1[temp1]
c2[temp1], c2[temp2] = c2[temp2], c2[temp1]
return ind_a, ind_b
def main():
#initial_a, initial_b = [1, 2, 3, 4, 5, 6, 7, 8], [3, 7, 5, 1, 6, 8, 2, 4]
initial_a, initial_b = [1, 4, 2, 3, 6, 5], [5, 1, 3, 4, 6, 2]
index_offset = 1
temp_a = [i - index_offset for i in initial_a]
temp_b = [i - index_offset for i in initial_b]
a, b = pmx_crossover(temp_a, temp_b)
final_a = [i + index_offset for i in a]
final_b = [i + index_offset for i in b]
print("Initial: {0:}, {1:}".format(initial_a, initial_b))
print("Final: {0:}, {1:}".format(final_a, final_b))
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main()
print("\nDone.")
Output (one of the possibilities (due to random.randint)):
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q058424002]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32
Initial: [1, 4, 2, 3, 6, 5], [5, 1, 3, 4, 6, 2]
Final: [1, 3, 2, 4, 6, 5], [5, 1, 4, 3, 6, 2]
Done.
c1 is out of range because in your for at the fourth index the value of indA[4] is 6.
And the range of c1 index it's 0-5 (it's lengh is 6).
With c1[indA[i]] = i
you try to do c1[6] = 4
I have:
import numpy as np
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7, ..., 4])
x = (B/position**2)*dt
A = np.cumsum(x)
assert A[0] == 0 # I want this to be true.
Where B and dt are scalar constants. This is for a numerical integration problem with initial condition of A[0] = 0. Is there a way to set A[0] = 0 and then do a cumsum for everything else?
I don't understand what exactly your problem is, but here are some things you can do to have A[0] = 0.
You can create A to be longer by one index to have the zero as the first entry:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.zeros(len(position) + 1)
A[1:] = np.cumsum((B/position**2)*dt)
Result:
A = [ 0. 0.0625 0.11559096 0.16105356 0.20073547 0.23633533 0.26711403]
len(A) == len(position) + 1
Alternatively, you can manipulate the calculation to substract the first entry of the result:
# initialize example data
import numpy as np
B = 1
dt = 1
position = np.array([4, 4.34, 4.69, 5.02, 5.3, 5.7])
# do calculation
A = np.cumsum((B/position**2)*dt)
A = A - A[0]
Result:
[ 0. 0.05309096 0.09855356 0.13823547 0.17383533 0.20461403]
len(A) == len(position)
As you see, the results have different lengths. Is one of them what you expect?
1D cumsum
A wrapper around np.cumsum that sets first element to 0:
def cumsum(pmf):
cdf = np.empty(len(pmf) + 1, dtype=pmf.dtype)
cdf[0] = 0
np.cumsum(pmf, out=cdf[1:])
return cdf
Example usage:
>>> np.arange(1, 11)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
>>> cumsum(np.arange(1, 11))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55])
N-D cumsum
A wrapper around np.cumsum that sets first element to 0, and works with N-D arrays:
def cumsum(pmf, axis=None, dtype=None):
if axis is None:
pmf = pmf.reshape(-1)
axis = 0
if dtype is None:
dtype = pmf.dtype
idx = [slice(None)] * pmf.ndim
# Create array with extra element along cumsummed axis.
shape = list(pmf.shape)
shape[axis] += 1
cdf = np.empty(shape, dtype)
# Set first element to 0.
idx[axis] = 0
cdf[tuple(idx)] = 0
# Perform cumsum on remaining elements.
idx[axis] = slice(1, None)
np.cumsum(pmf, axis=axis, dtype=dtype, out=cdf[tuple(idx)])
return cdf
Example usage:
>>> np.arange(1, 11).reshape(2, 5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
>>> cumsum(np.arange(1, 11).reshape(2, 5), axis=-1)
array([[ 0, 1, 3, 6, 10, 15],
[ 0, 6, 13, 21, 30, 40]])
I totally understand your pain, I wonder why Numpy doesn't allow this with np.cumsum. Anyway, though I'm really late and there's already another good answer, I prefer this one a bit more:
np.cumsum(np.pad(array, (1, 0), "constant"))
where array in your case is (B/position**2)*dt. You can change the order of np.pad and np.cumsum as well. I'm just adding a zero to the start of the array and calling np.cumsum.
You can use roll (shift right by 1) and then set the first entry to zero.
How to find out which indices belong to the lowest x (say, 5) numbers of an array?
[10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
Also, how to directly find the sorted (from low to high) lowest x numbers?
The existing answers are nice, but here's the solution if you're using numpy:
mylist = np.array([10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7])
x = 5
lowestx = np.argsort(mylist)[:x]
#array([ 2, 3, 5, 10, 4])
You could do something like this:
>>> l = [5, 1, 2, 4, 6]
>>> sorted(range(len(l)), key=lambda i: l[i])
[1, 2, 3, 0, 4]
mylist = [10.18398473, 9.95722384, 9.41220631, 9.42846614, 9.7300549 , 9.69949144, 9.86997862, 10.28299122, 9.97274071, 10.08966867, 9.7]
# lowest 5
lowest = sorted(mylist)[:5]
# indices of lowest 5
lowest_ind = [i for i, v in enumerate(mylist) if v in lowest]
# 5 indices of lowest 5
import operator
lowest_5ind = [i for i, v in sorted(enumerate(mylist), key=operator.itemgetter(1))[:5]]
[a.index(b) for b in sorted(a)[:5]]
sorted(a)[.x]