Create an array of size n with initialized value - python

In python, it's possible to create an array of size n with [] * n or even initialize a fixed value for all items with [0] * n.
Is it possible to do something similar but initialize the value with 500n?
For example, create an array with size of 10 would result in this.
[0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500]
I can easily achieve this with a variable and a for loop as shown below but I would like optimize this in one go to pass into Robot Framework.
arr = [0] * 10
for i, v in enumerate(arr):
arr[i] = 500 * i

Use a simple comprehension:
[i*500 for i in range(n)]

The other answer gives you a way, here's another :
list(range(0, 500*n, 500))

It's allways a good idea to use numpy arrays. They have more fuctionalites and are very fast in use (vectorized form, and fast compiled C-code under the hood). Your example with numpy:
import numpy as np
nx = 10
a = 500*np.arange(nx)
a
gives:
array([ 0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500])

Related

How to get corresponding numpy array values after performing calculation on another related array

I have a function that returns two NumPy arrays (width and height) like so:
width, height = calc_heigh_width(data)
width
>>> array([390, 20, 65, 1000])
height
>>> array([2, 7, 3, 1])
Imagine the widths as being points on an x-axis, so they go from 0 to 1000 in this example, similarly for the height.
I am only interested in places where the current width is bigger than the previous width by 600 and less than the previous width by 800. So I wrote this to solve that:
import numpy as np
sorted_width_array = np.sort(width)
width_calc_list = []
for i in range(len(sorted_width_array) - 1):
if ((sorted_width_array[i+1] - sorted_width_array[i]) > 600) and ((sorted_width_array[i+1] - sorted_width_array[i]) < 800):
width_calc_list.append(sorted_width_array[i])
width_calc_list.append(sorted_width_array[i+1])
This returns
width_calc_list
>>> [390, 1000]
However, I would also like to get the height values that correspond to these width values but I just haven't been able to solve it. Any help is much appreciated.
So basically, I'd like to get something like this
height_calc_list
>>> [2, 1]
PS: I know my current code to find the width works but if there's any way to improve it I'm all ears. I was playing around with NumPy's diff function and I'm optimistic that will work but haven't been able to get it to work.
you can just zip the values and use the same algorithm you have written. i did not check the correctness of your solution, just changed it so that it returns the format you want. check it out
width = [390, 20, 65, 1000]
height = [2, 7, 3, 1]
cord = sorted(list(zip(width, height)), key=lambda tup: tup[0])
print(cord)
width_calc_list = []
for i in range(len(cord) - 1):
if ((cord[i+1][0] - cord[i][0]) > 600) and ((cord[i+1][0] - cord[i][0]) < 800):
width_calc_list.append(cord[i])
width_calc_list.append(cord[i+1])
print(width_calc_list)
#[(390, 2), (1000, 1)]

Consecutively split an array by the next max value

Suppose I have an array (the elements can be floats also):
D = np.array([0,0,600,160,0,1200,1800,0,1800,900,900,300,1400,1500,320,0,0,250])
The goal is, starting from the beginning of the array, to find the max value (the last one if there are several equal ones) and cut the anterior part of the array. Then consecutively repeat this procedure till the end of the array. So, the expected result would be:
[[0,0,600,160,0,1200,1800,0,1800],
[900,900,300,1400,1500],
[320],
[0,0,250]]
I managed to find the last max value:
D_rev = D[::-1]
last_max_index = len(D_rev) - np.argmax(D_rev) - 1
i.e. I can get the first subarray of the desired answer. And then I can use a loop to get the rest.
My question is, if there is a numpy way to do it without looping?
IIUC, you can take the reverse cumulated max (see accumulate) of D to form groups, then split with itertools.groupby:
D = np.array([0,0,600,160,0,1200,1800,0,1800,900,900,300,1400,1500,320,0,0,250])
groups = np.maximum.accumulate(D[::-1])[::-1]
# array([1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1500, 1500,
# 1500, 1500, 1500, 320, 250, 250, 250])
from itertools import groupby
out = [list(list(zip(*g))[0]) for _, g in groupby(zip(D, groups), lambda x: x[1])]
# [[0, 0, 600, 160, 0, 1200, 1800, 0, 1800],
# [900, 900, 300, 1400, 1500],
# [320],
# [0, 0, 250]]

Get all possible chunks of length N

I have an array with shape (100000,) over which I want to apply a sliding window of length 200 with a step size of 1. This means that the output array will have the shape (99800,200) - i.e., all unique chunks of length 200. I cannot find an efficient function in numpy that achieves this. I have tried:
for i in range(data.shape[0] - 200):
windows = np.append(windows , data[i:i+200]);
Which not only produces the wrong shape (1D), but it is also incredibly slow. Is there a fast function in Numpy to do this?
Try stride_tricks in numpy. It basically does not use up any extra space than the original array a, but creates a (virtual) strided array containing all the sliding windows.
def slide(a, size):
stride = a.strides[0]
n = a.size - size + 1
return np.lib.stride_tricks.as_strided(a, shape = (n, size), strides = (stride, stride))
a = np.arange(100000)
slide(a, size = 200)
>>>array([[ 0, 1, 2, ..., 197, 198, 199],
[ 1, 2, 3, ..., 198, 199, 200],
[ 2, 3, 4, ..., 199, 200, 201],
...,
[99798, 99799, 99800, ..., 99995, 99996, 99997],
[99799, 99800, 99801, ..., 99996, 99997, 99998],
[99800, 99801, 99802, ..., 99997, 99998, 99999]])
Here's a numpy answer
window_size = 10
i = np.arange(data.size - window_size + 1)
indices = np.add(np.array([np.arange(window_size)] * (data.size - window_size + 1)), i.reshape(-1, 1))
windows = data[indices]
Best function I've seen for this (non-numpy) is skimage.util.view_as_windows()
from skimage.util import view_as_windows
windows = view_as_windows(data, 200)
If you want numpy-only, the recipe in the dupe target is the most general answer, although #swag2198 suggests a more lightweight version in another answer here.

How to find the nearest neighbour index from one series to another

I have a target array A, which represents isobaric pressure levels in NCEP reanalysis data.
I also have the pressure at which a cloud is observed as a long time series, B.
What I am looking for is a k-nearest neighbour lookup that returns the indices of those nearest neighbours, something like knnsearch in Matlab that could be represented the same in python such as: indices, distance = knnsearch(A, B, n)
where indices is the nearest n indices in A for every value in B, and distance is how far removed the value in B is from the nearest value in A, and A and B can be of different lengths (this is the bottleneck that I have found with most solutions so far, whereby I would have to loop each value in B to return my indices and distance)
import numpy as np
A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10]) # this is a fixed 17-by-1 array
B = np.array([923, 584.2, 605.3, 153.2]) # this can be any n-by-1 array
n = 2
What I would like returned from indices, distance = knnsearch(A, B, n) is this:
indices = [[1, 2],[4, 5] etc...]
where 923 in A is matched to first A[1]=925 and then A[2]=850
and 584.2 in A is matched to first A[4]=600 and then A[5]=500
distance = [[72, 77],[15.8, 84.2] etc...]
where 72 represents the distance between queried value in B to the nearest value in A e.g. distance[0, 0] == np.abs(B[0] - A[1])
The only solution I have been able to come up with is:
import numpy as np
def knnsearch(A, B, n):
indices = np.zeros((len(B), n))
distances = np.zeros((len(B), n))
for i in range(len(B)):
a = A
for N in range(n):
dif = np.abs(a - B[i])
ind = np.argmin(dif)
indices[i, N] = ind + N
distances[i, N] = dif[ind + N]
# remove this neighbour from from future consideration
np.delete(a, ind)
return indices, distances
array_A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10])
array_B = np.array([923, 584.2, 605.3, 153.2])
neighbours = 2
indices, distances = knnsearch(array_A, array_B, neighbours)
print(indices)
print(distances)
returns:
[[ 1. 2.]
[ 4. 5.]
[ 4. 3.]
[10. 11.]]
[[ 2. 73. ]
[ 15.8 84.2]
[ 5.3 94.7]
[ 3.2 53.2]]
There must be a way to remove the for loops, as I need the performance should my A and B arrays contain many thousands of elements with many nearest neighbours...
Please help! Thanks :)
The second loop can easily be vectorized. The most straightforward way to do it is to use np.argsort and select the indices corresponding to the n smallest dif values. However, for large arrays, as only n values should be sorted, it is better to use np.argpartition.
Therefore, the code would look like something like that:
def vector_knnsearch(A, B, n):
indices = np.empty((len(B), n))
distances = np.empty((len(B), n))
for i,b in enumerate(B):
dif = np.abs(A - b)
min_ind = np.argpartition(dif,n)[:n] # Returns the indexes of the 3 smallest
# numbers but not necessarily sorted
ind = min_ind[np.argsort(dif[min_ind])] # sort output of argpartition just in case
indices[i, :] = ind
distances[i, :] = dif[ind]
return indices, distances
As said in the comments, the first loop can also be removed using a meshgrid, however, the extra use of memory and computation time to construct the meshgrid makes this approach slower for the dimensions I tried (and this will probably get worse for large arrays and end up in Memory Error). In addition, the readability of the code decreases. Overall, this would probably do this approach less pythonic.
def mesh_knnsearch(A, B, n):
m = len(B)
rng = np.arange(m).reshape((m,1))
Amesh, Bmesh = np.meshgrid(A,B)
dif = np.abs(Amesh-Bmesh)
min_ind = np.argpartition(dif,n,axis=1)[:,:n]
ind = min_ind[rng,np.argsort(dif[rng,min_ind],axis=1)]
return ind, dif[rng,ind]
Not that it is important to define this rng as a 2d array in order to retrieve a[rng[0],ind[0]], a[rng[1],ind[1]], etc and maintain the dimensions of the array, as opposed to a[:,ind] which retrieves a[:,ind[0]], a[:,ind[1]], etc.

iterating through float - variable input can change to float

i am calculating protein capacities (steric mass action model) within several loops (i know, filling up a numpy array can be quite slow and there are faster methods, but it works for now):
import numpy as np
a = [10,20,30] # salt concentrations tested
b = [4,5,6] # measured data points
c = 2 # number of components
q = np.empty((c,len(a),len(b)))
for ii,cs in enumerate(a):
for iii,cp in enumerate(b):
for i in range(c):
q[i,ii,iii] = cs*cp
Basically, q contains the measured data points for each component at each salt concentration and has the shape (number of components,number of salt concentrations,number of measurements). The code works fine. However, if i use only one salt concentration, the line for ii,cs in enumerate(a): does not work anymore (float object is not iterable).
I can use if statements. But is there a better way ( less confusing code) ?
When you use a single salt concentration, instead of writing
a = 2
write
a = [2]
This way you'll keep it as a list and your code will still work.
By the way, you can compute q using the following NumPy one-liner:
In [39]: np.tile(np.outer(a, b), (c, 1, 1))
Out[39]:
array([[[ 40, 50, 60],
[ 80, 100, 120],
[120, 150, 180]],
[[ 40, 50, 60],
[ 80, 100, 120],
[120, 150, 180]]])

Categories

Resources