Numpy array with different mean and standard deviation per column - python

i would like to get an numpy array , shape 1000 row and 2 column.
1st column will contain - Gaussian distributed variables with standard deviation 2 and mean 1.
2nd column will contain Gaussian distributed variables with mean -1 and standard deviation 0.5.
How to create the array using define value of mean and std?

You can use numpy's random generators.
import numpy as np
# as per kwinkunks suggestion
rng = np.random.default_rng()
arr1 = rng.normal(1, 2, 1000).reshape(1000, 1)
arr2 = rng.normal(-1, 0.5, 1000).reshape(1000, 1)
arr1[:5]
array([[-2.8428678 ],
[ 2.52213097],
[-0.98329961],
[-0.87854616],
[ 0.65674208]])
arr2[:5]
array([[-0.85321735],
[-1.59748405],
[-1.77794019],
[-1.02239036],
[-0.57849622]])
After that, you can concatenate.
np.concatenate([arr1, arr2], axis = 1)
# output
array([[-2.8428678 , -0.85321735],
[ 2.52213097, -1.59748405],
[-0.98329961, -1.77794019],
...,
[ 0.84249042, -0.26451526],
[ 0.6950764 , -0.86348222],
[ 3.53885426, -0.95546126]])

Use np.random.normal directly:
import numpy as np
np.random.normal([1, -1], [2, 0.5], (1000, 2))

You can just create two normal distributions with the mean and std for each and stack them.
np.hstack((np.random.normal(1, 2, size=(1000,1)), np.random.normal(-1, 0.5, size=(1000,1))))

Related

Accumulate 2D numpy arrays into a 3D tensor, then average them all element-wise after

Accumulation stage
In the script, the same-sized data matrix X is re-estimated by some model (here just a random number generator (RNG)) and accumulated/saved in a matrix Y over the course of a finite number of trials t.
import numpy as np
from numpy.random import random
import pandas as pd
k = 3 #shape
t = 5 #trials
Y = np.zeros((t,k,k))
for i in range(5):
X = random((k,k)) #2D estimate
X = pd.DataFrame(X)
Y[i,:,:] = X #3D tensor
Reduction stage
Afterwards, how do I then apply an element-wise reduction of all accumulated 2d X arrays inside the 3d Y tensor into a single 2d matrix Z that is the same shape as X? An example reduction is the average of all the individual X elements reduced into Z:
Z[0,0] = average of: {the first Z[0,0], second Z[0,0], ... , fifth Z[0,0]}
I'd prefer no element-by-element loops, if possible. I showed the accumulation stage using numpy arrays because I don't think pandas DataFrames can be 3d tensors, being restricted to 2d inputs only, but can the arithmetic reduction stage (averaging across accumulated arrays) be done as a pandas DataFrame operation?
Is this what you're looking for?
Toy example:
test = np.arange(12).reshape(-1,2,3)
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
Solution
np.apply_over_axes(np.mean,test,0).reshape(test.shape[1],test.shape[2])
array([[3., 4., 5.],
[6., 7., 8.]])
And IIRC I think you're correct, pandas cannot really handle 3d tensors unless you mess about with Multi-indices, so personally I would rather deal with this operation in numpy first and then convert it to a dataframe. You can convert dataframes to numpy via to_numpy().

Min/max scaling with additional points

I'm trying to normalize an array within a range, e.g. [10,100]
But I also want to manually specify additional points in my result array, for example:
num = [1,2,3,4,5,6,7,8]
num_expected = [min(num), 5, max(num)]
expected_range = [10, 20, 100]
result_array = normalize(num, num_expected, expected_range)
Intended results:
Values from 1-5 are normalized to range (10,20].
5 in num array is mapped to 20 in expected range.
Values from 6-8 are normalized to range (20,100].
I know I can do it by normalizing the array twice, but I might have many additional points to add. I was wondering if there's any built-in function in numpy or scipy to do this?
I've checked MinMaxScaler in sklearn, but did not find the functionality I want.
Thanks!
Linear interpolation will do exactly what you want:
import scipy.interpolate
interp = scipy.interpolate.interp1d(num_expected, expected_range)
Then just pass numbers or arrays of numbers that you want to interpolate:
In [20]: interp(range(1, 9))
Out[20]:
array([ 10. , 12.5 , 15. , 17.5 ,
20. , 46.66666667, 73.33333333, 100. ])

Numpy array with different standard deviation per row

I'd like to get an NxM matrix where numbers in each row are random samples generated from different normal distributions(same mean but different standard deviations). The following code works:
import numpy as np
mean = 0.0 # same mean
stds = [1.0, 2.0, 3.0] # different stds
matrix = np.random.random((3,10))
for i,std in enumerate(stds):
matrix[i] = np.random.normal(mean, std, matrix.shape[1])
However, this code is not quite efficient as there is a for loop involved. Is there a faster way to do this?
np.random.normal() is vectorized; you can switch axes and transpose the result:
np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T
print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]
That is, the scale parameter is the column-wise standard deviation, hence the need to transpose via .T since you want row-wise inputs.
How about this?
rows = 10000
stds = [1, 5, 10]
data = np.random.normal(size=(rows, len(stds)))
scaled = data * stds
print(np.std(scaled, axis=0))
Output:
[ 0.99417905 5.00908719 10.02930637]
This exploits the fact that a two normal distributions can be interconverted by linear scaling (in this case, multiplying by standard deviation). In the output, each column (second axis) will contain a normally distributed variable corresponding to a value in stds.

Correlate a single time series with a large number of time series

I have a large number (M) of time series, each with N time points, stored in an MxN matrix. Then I also have a separate time series with N time points that I would like to correlate with all the time series in the matrix.
An easy solution is to go through the matrix row by row and run numpy.corrcoef. However, I was wondering if there is a faster or more concise way to do this?
Let's use this correlation formula :
You can implement this for X as the M x N array and Y as the other separate time series array of N elements to be correlated with X. So, assuming X and Y as A and B respectively, a vectorized implementation would look something like this -
import numpy as np
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:,None]
B_mB = B - B.mean()
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum()
# Finally get corr coeff
out = np.dot(A_mA,B_mB.T).ravel()/np.sqrt(ssA*ssB)
# OR out = np.einsum('ij,j->i',A_mA,B_mB)/np.sqrt(ssA*ssB)
Verify results -
In [115]: A
Out[115]:
array([[ 0.1001229 , 0.77201334, 0.19108671, 0.83574124],
[ 0.23873773, 0.14254842, 0.1878178 , 0.32542199],
[ 0.62674274, 0.42252403, 0.52145288, 0.75656695],
[ 0.24917321, 0.73416177, 0.40779406, 0.58225605],
[ 0.91376553, 0.37977182, 0.38417424, 0.16035635]])
In [116]: B
Out[116]: array([ 0.18675642, 0.3073746 , 0.32381341, 0.01424491])
In [117]: out
Out[117]: array([-0.39788555, -0.95916359, -0.93824771, 0.02198139, 0.23052277])
In [118]: np.corrcoef(A[0],B), np.corrcoef(A[1],B), np.corrcoef(A[2],B)
Out[118]:
(array([[ 1. , -0.39788555],
[-0.39788555, 1. ]]),
array([[ 1. , -0.95916359],
[-0.95916359, 1. ]]),
array([[ 1. , -0.93824771],
[-0.93824771, 1. ]]))

how to make an array of scaled random digits

import numpy as np
np.random.random(X) #where x is a positive integer
This gives me an array of X numbers on the interval (0 , 1). However, I want the numbers to be on the interval (-1 , 1) and I don't know how to scaled them in numpy. How can I do this very simply only using numpy?
You could simply use np.random.uniform:
>>> import numpy as np
>>> np.random.uniform(-1, 1, size=5)
array([-0.32235009, -0.8347222 , -0.83968268, 0.78546736, 0.399747 ])
Multiply the random values by 2, then subtract 1. This yields random values in the range -1 to 1.

Categories

Resources