Generate bootstrap sample from ndarray - python

Is there a way to generate a bootstrap sample on an N-dimensional array? I am limited to using numpy==1.19.4
I have already tried using a for loop on the other dimensions to no avail, but the following works for 1-dimensional arrays.
import numpy as np
# Set random state and number of resamples
random.seed(random_state)
n_resamples = 9999
# Generate data
data_1d = np.arange(2, 3, 0.1)
data_nd = np.random.default_rng(42).random((2,3,2))
data = data_1d.copy()
# Resample the data with replacement, computing the test statistic for each set of resamples
bs_samples = [np.std(np.random.choice(data, size=len(data))) for _ in range(n_resamples)]

If I get your problem, I use to apply this method:
suppose you have this multi-dimensionale array:
data_nd = np.random.rand(100, 3, 2)
data_nd.shape #(100, 3, 2)
you can sample elements with bootstrap in this way:
n_resamples = 99
data_nd[np.random.randint(len(data_nd), size=len(data_nd)*n_resamples)].reshape(n_resamples, *data_nd.shape).shape
what I'm doing is to randomly extract indices (randint) with replacement and finally reshape the sampling to obtain 99 bootstrapped dataset with the same dimensions of the original one.
Note that by this procedure you are considering as "elements" the arrays along the first ax and so each element that you are sampling have shape (3,2).
I hope that is clear, but if you have any doubt please let me know.

Related

Converting a 2D segmentation map to n dimensional numpy masks

I have a segmentation map with 10 classes (A numpy array of size (m,n,1) which every element is a number from 1~10 specifying a class that the pixel belongs to). I want to convert it to an array of size (m,n,10) where each channel is mask for elements of that specific class. I can do it using a for loop like this:
for i in range(10):
mask[:,:,i] = (seg_map==i)[:,:,0]
but I need a faster way to do this. The for loop takes too much time. Is there any built in function that can outperform the for loop.
Thanks in advance.
One approach:
import numpy as np
np.random.seed(42)
# toy data
data = np.random.randint(0, 10, 20).reshape((5, 4, 1))
# https://stackoverflow.com/a/37323404/4001592
n_values = 10
values = data.flatten()
encoded = np.eye(n_values)[data.ravel()].reshape((5, 4, 10))
match = np.allclose(data.reshape(5, 4), encoded.argmax(-1))
print(match)
One way to verify that the output is correct is to verify that the one-hot encoded value matches back with the index, as below:
match = np.allclose(data.reshape(5, 4), encoded.argmax(-1))
print(match)
Output
True

Computing grid computations using numpy meshgrid

I have used numpy meshgrids for a long time, and typically find no issues when trying to pass that meshgrid through a function. In my experience it has always been the case that I can define my coordinate space as
x,y,z = numpy.meshgrid(numpy.linspace(-10,10,10),
numpy.linspace(-10,10,10),
numpy.linspace(-10,10,10))
and then can easily compute something like
u,v,w = numpy.sin(x*y)+numpy.cos(z).
My issue has arisen from the need to do a cross product in that calculation. I am defining a field using the meshgrid, and trying to pass the entire meshgrid through the function:
field_equation = lambda x,y,z: sum([parameter*np.cross([wire_x[i],wire_y[i],wire_z[i]],[x,y,z]) for i in range(len(wire))])
Depending on how I try to solve the problem, I get a whole host of problems. The code works fine when passing individual points (x,y,z) through one at a time, but cannot calculate for the entire field. How do I get around this?
np.cross only accept a vector of size 3, or nd-array with the last dimension of size 3, so we need to stack np.stack([x,y,z]) to create a 10*10*10*3 nd-array first.
The results will be a 10*10*10*3 array, and to be able to unpack this array later, we need to transpose it to size 3*10*10*10, so I swap axes of resulting array at the end.
In the code below, I also take the liberty to shorten the code wrt wire a little, assuming wire_x, wire_y, wire_z are just 3 components of wire.
import numpy as np
# test data
x,y,z = np.meshgrid(np.linspace(-10,10,10),
np.linspace(-10,10,10),
np.linspace(-10,10,10))
wire = [[1,2,3,4], [5,6,7,8], [3,4,5,6]]
parameter = 1
field_equation = lambda x,y,z: sum([parameter*np.cross(w, np.stack([x,y,z], axis=-1)) for w in zip(*wire)]).swapaxes(0,-1)
a,b,c = field_equation(x,y,z)
print(a.shape, b.shape, c.shape)
#(10, 10, 10) (10, 10, 10) (10, 10, 10)

How to add several vectors to numpy structered array and call matrix later from fieldname?

Hey guys Ii need help..
I want to use tensorflows data import, where data is loaded by calling the features/labels vectors from a structured numpy array.
https://www.tensorflow.org/programmers_guide/datasets#consuming_numpy_arrays
I want to create such an structured array by adding consecutively the 2 vectors (feature_vec and label_vec) to an numpy structured array.
import numpy as np
# example vectors
feature_vec= np.arange(10)
label_vec = np.arange(10)
# structured array which should get the vectors
struc_array = np.array([feature_vec,label_vec],dtype=([('features',np.float32), ('labels',np.float32)]))
# How can I add now new vectors to struc_array?
struc_array.append(---)
I want later when this array is loaded from file call either the feature vectors (which is a matrix now) by using the fieldname:
with np.load("/var/data/training_data.npy") as data:
features = data["features"] # matrix containing feature vectors as rows
labels = data["labels"] #matrix containing labels vectors as rows
Everything I tried to code was complete crap.. never got a correct output..
Thanks for your help!
Don't create a NumPy array and then append to it. That doesn't really make sense, as NumPy arrays have a fixed size and require a full copy to append a single row or column. Instead, create a list, append to it, then construct the array at the end:
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
# append as many times as you want:
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
Of course, you probably need ot
Unfortunately, this doesn't solve the problem.
i want to get just the labels or the features from structured array by using:
labels = struc_array['labels']
features = struc_array['features']
But when i use the structured array like you did, labels and also features contains all given appended vectors:
import numpy as np
feature_vec= np.arange(10)
label_vec = np.arange(0,5,0.5)
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
other_vec = np.arange(6,11,0.5)
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
# This contains all vectors.. not just the labels vector
labels = struc_array['labels']
# This also contains all vectors.. not just the feature vector
features = struc_array['features']

How to concatenate N different 1D arrays in python

I am new to python. I have to implement k-fold cross validation in python. I am able to split the given data in k equal sized arrays but not able to concatenate the k-1 arrays which will be the training data set. I know about concatenate() in numpy but as k is determine on the fly not sure how to use it in this scenario. Appreciate any info in this regard. Thanks in advance.
Check out numpy.vstack. This stacks an iterable of arrays on top of each other (assuming the column dimensions match). hstack does the opposite.
import numpy as np
k = 10
all_data = [np.random.random((10,5)) for i in range(k)]
train = all_data[:k-1] #list of 9 (10,5) arrays
test = all_data[k-1] #one (10,5) array
train = np.vstack(train) #stacks them on top of each other
print train.shape # one (90, 5) array

Efficient two dimensional numpy array statistics

I have many 100x100 grids, is there an efficient way using numpy to calculate the median for every grid point and return just one 100x100 grid with the median values? Presently, I'm using a for loop to run through each grid point, calculating the median and then combining them into one grid at the end. I'm sure there's a better way to do this using numpy. Any help would be appreciated! Thanks!
Create as 100x100xN array (or stack together if that's not possible) and use np.median with the correct axis to do it in one go:
import numpy as np
a = np.random.rand(100,100)
b = np.random.rand(100,100)
c = np.random.rand(100,100)
d = np.dstack((a,b,c))
result = np.median(d,axis=2)
How many grids are there?
One option would be to create a 3D array that is 100x100xnumGrids and compute the median across the 3rd dimension.
use axis parameter of median:
import numpy as np
data = np.random.rand(100, 5, 5)
print np.median(data, axis=0)
print np.median(data[:, 0, 0])
print np.median(data[:, 1, 0])

Categories

Resources