I am new to python. I have to implement k-fold cross validation in python. I am able to split the given data in k equal sized arrays but not able to concatenate the k-1 arrays which will be the training data set. I know about concatenate() in numpy but as k is determine on the fly not sure how to use it in this scenario. Appreciate any info in this regard. Thanks in advance.
Check out numpy.vstack. This stacks an iterable of arrays on top of each other (assuming the column dimensions match). hstack does the opposite.
import numpy as np
k = 10
all_data = [np.random.random((10,5)) for i in range(k)]
train = all_data[:k-1] #list of 9 (10,5) arrays
test = all_data[k-1] #one (10,5) array
train = np.vstack(train) #stacks them on top of each other
print train.shape # one (90, 5) array
Related
Is there a way to generate a bootstrap sample on an N-dimensional array? I am limited to using numpy==1.19.4
I have already tried using a for loop on the other dimensions to no avail, but the following works for 1-dimensional arrays.
import numpy as np
# Set random state and number of resamples
random.seed(random_state)
n_resamples = 9999
# Generate data
data_1d = np.arange(2, 3, 0.1)
data_nd = np.random.default_rng(42).random((2,3,2))
data = data_1d.copy()
# Resample the data with replacement, computing the test statistic for each set of resamples
bs_samples = [np.std(np.random.choice(data, size=len(data))) for _ in range(n_resamples)]
If I get your problem, I use to apply this method:
suppose you have this multi-dimensionale array:
data_nd = np.random.rand(100, 3, 2)
data_nd.shape #(100, 3, 2)
you can sample elements with bootstrap in this way:
n_resamples = 99
data_nd[np.random.randint(len(data_nd), size=len(data_nd)*n_resamples)].reshape(n_resamples, *data_nd.shape).shape
what I'm doing is to randomly extract indices (randint) with replacement and finally reshape the sampling to obtain 99 bootstrapped dataset with the same dimensions of the original one.
Note that by this procedure you are considering as "elements" the arrays along the first ax and so each element that you are sampling have shape (3,2).
I hope that is clear, but if you have any doubt please let me know.
Hey guys Ii need help..
I want to use tensorflows data import, where data is loaded by calling the features/labels vectors from a structured numpy array.
https://www.tensorflow.org/programmers_guide/datasets#consuming_numpy_arrays
I want to create such an structured array by adding consecutively the 2 vectors (feature_vec and label_vec) to an numpy structured array.
import numpy as np
# example vectors
feature_vec= np.arange(10)
label_vec = np.arange(10)
# structured array which should get the vectors
struc_array = np.array([feature_vec,label_vec],dtype=([('features',np.float32), ('labels',np.float32)]))
# How can I add now new vectors to struc_array?
struc_array.append(---)
I want later when this array is loaded from file call either the feature vectors (which is a matrix now) by using the fieldname:
with np.load("/var/data/training_data.npy") as data:
features = data["features"] # matrix containing feature vectors as rows
labels = data["labels"] #matrix containing labels vectors as rows
Everything I tried to code was complete crap.. never got a correct output..
Thanks for your help!
Don't create a NumPy array and then append to it. That doesn't really make sense, as NumPy arrays have a fixed size and require a full copy to append a single row or column. Instead, create a list, append to it, then construct the array at the end:
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
# append as many times as you want:
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
Of course, you probably need ot
Unfortunately, this doesn't solve the problem.
i want to get just the labels or the features from structured array by using:
labels = struc_array['labels']
features = struc_array['features']
But when i use the structured array like you did, labels and also features contains all given appended vectors:
import numpy as np
feature_vec= np.arange(10)
label_vec = np.arange(0,5,0.5)
vecs = [feature_vec,label_vec]
dtype = [('features',np.float32), ('labels',np.float32)]
other_vec = np.arange(6,11,0.5)
vecs.append(other_vec)
dtype.append(('other', np.float32))
struc_array = np.array(vecs, dtype=dtype)
# This contains all vectors.. not just the labels vector
labels = struc_array['labels']
# This also contains all vectors.. not just the feature vector
features = struc_array['features']
I have a numpy 2D array with values that range from 0 to 59.
for those who are familiar with DL and specifically Image Segmentation - I create the array (call it L) from a .png image and the value of each pixel L[x,y] means the class that this pixel belongs to (out of the 60 classes).
I want to create a 1-hot tensor - Lhot, in which (Lhot[x,y,z] == 1) only if (L[x,y] == z), and 0 otherwise.
I want to create it with some kind of broadcasting/indexing (1,2 lines) - without loops.
it should be functionally equal to this piece of code (Dtype corresponds to L):
Lhot = np.zeros((L.shape[0], L.shape[1], 60), dtype=Dtype)
for i in range(L.shape[0]):
for j in range(L.shape[1]):
Lhot[i,j,L[i,j]] = 1
anyone has an idea?
Thanks!
Much faster and cleaner way using pure numpy
Lhot = np.transpose(np.eye(60)[L], (1,2,0))
Problem you'll run into with multidimensional one-hots is they get really big and really sparse and there's no good way to handle sparse arrays with more than 2D in numpy/scipy (or sklearn or many other ML packages either I think). Do you really need an n-d one-hot?
Since typical one-hot encoding is defined for 1D vectors, all you have to do is flatten your matrix, use one hot encoder from scikit-learn (or any other library with one-hot encoding) and reshape back.
from sklearn.preprocessing import OneHotEncoder
n, m = L.shape
k = 60
Lhot = np.array(OneHotEncoder(n_values=k).fit_transform(L.reshape(-1,1)).todense()).reshape(n, m, k)
of course you can do it by hand too
n, m = L.shape
k = 60
Lhot = np.zeros((n*m, k)) # empty, flat array
Lhot[np.arange(n*m), L.flatten()] = 1 # one-hot encoding for 1D
Lhot = Lhot.reshape(n, m, k) # reshaping back to 3D tensor
currently im facing a problem regarding the permutation of 2 numpy arrays of different row sizes, i know how to to utilize the np.random.shuffle function but i cannot seem to find a solution to my specific problem, the examples from the numpy documentation only refers to nd arrays with the same row sizes, e.g x.shape=[10][784] y.shape=[10][784]
I want to permute/random shuffle the column values in a consistent order for both arrays with those shapes:x.shape=[60000][784], y.shape=[10000][784].
e.g.
x[59000] = [0,1,2,3,4,5,6,7,8,9]
y[9999] = [0,1,2,3,4,5,6,7,8,9]
After the permutation, both of them should be shuffled in the same consistent way e.g.
x[59000] = [3,0,1,6,7,2,9,8,4,5] y[9999] = [3,0,1,6,7,2,9,8,4,5]
The shuffle order needs to be consistent over the two arrays which have different row sizes. I seem to get a ValueError: Found input variables with inconsistent numbers of samples: [60000, 10000]" Any ideas on how to fix this issue? Really appreciate any help!
Stick the arrays together and permute the combined array:
merged = numpy.concatenate([x, y])
numpy.shuffle(merged.T)
x, y = numpy.split(merged, [x.shape[0]])
Check also old threads
Better way to shuffle two numpy arrays in unison
Or compute a permutation ahead
your_permutation = np.shuffle(np.array([0, 1, 2, 3, 4, 5]))
i = np.argsort(your_permutation)
x = x[i]
y = y[i]
I used the MNIST dataset for training a neural network, where the training data is returned as a tuple with two entries. The first entry contains the actual training images. This is a numpy ndarray with 50,000 entries. Each entry is, in turn, a numpy ndarray with 784 values, representing the 28 * 28 = 784 pixels in a single MNIST image.
I would like to create a new training set, however I do not know how to create an ndarray from other ndarrays. For instance, if I have the following two ndarrays:
a = np.ndarray((3,1), buffer=np.array([0.9,1.0,1.0]), dtype=float)
b = np.ndarray((3,1), buffer=np.array([0.8,1.0,1.0]), dtype=float)
how to make a third one containing these two?
I tried the following but it creates only one entry.
c = np.ndarray((1,6,1), buffer=np.array(([a],[b])), dtype=float)
I would need it to be two entries.
Thanks, in the meanwhile I figured out it is simply:
c = np.array((a, b))