i would like to get an numpy array , shape 1000 row and 2 column.
1st column will contain - Gaussian distributed variables with standard deviation 2 and mean 1.
2nd column will contain Gaussian distributed variables with mean -1 and standard deviation 0.5.
How to create the array using define value of mean and std?
You can use numpy's random generators.
import numpy as np
# as per kwinkunks suggestion
rng = np.random.default_rng()
arr1 = rng.normal(1, 2, 1000).reshape(1000, 1)
arr2 = rng.normal(-1, 0.5, 1000).reshape(1000, 1)
arr1[:5]
array([[-2.8428678 ],
[ 2.52213097],
[-0.98329961],
[-0.87854616],
[ 0.65674208]])
arr2[:5]
array([[-0.85321735],
[-1.59748405],
[-1.77794019],
[-1.02239036],
[-0.57849622]])
After that, you can concatenate.
np.concatenate([arr1, arr2], axis = 1)
# output
array([[-2.8428678 , -0.85321735],
[ 2.52213097, -1.59748405],
[-0.98329961, -1.77794019],
...,
[ 0.84249042, -0.26451526],
[ 0.6950764 , -0.86348222],
[ 3.53885426, -0.95546126]])
Use np.random.normal directly:
import numpy as np
np.random.normal([1, -1], [2, 0.5], (1000, 2))
You can just create two normal distributions with the mean and std for each and stack them.
np.hstack((np.random.normal(1, 2, size=(1000,1)), np.random.normal(-1, 0.5, size=(1000,1))))
Related
Accumulation stage
In the script, the same-sized data matrix X is re-estimated by some model (here just a random number generator (RNG)) and accumulated/saved in a matrix Y over the course of a finite number of trials t.
import numpy as np
from numpy.random import random
import pandas as pd
k = 3 #shape
t = 5 #trials
Y = np.zeros((t,k,k))
for i in range(5):
X = random((k,k)) #2D estimate
X = pd.DataFrame(X)
Y[i,:,:] = X #3D tensor
Reduction stage
Afterwards, how do I then apply an element-wise reduction of all accumulated 2d X arrays inside the 3d Y tensor into a single 2d matrix Z that is the same shape as X? An example reduction is the average of all the individual X elements reduced into Z:
Z[0,0] = average of: {the first Z[0,0], second Z[0,0], ... , fifth Z[0,0]}
I'd prefer no element-by-element loops, if possible. I showed the accumulation stage using numpy arrays because I don't think pandas DataFrames can be 3d tensors, being restricted to 2d inputs only, but can the arithmetic reduction stage (averaging across accumulated arrays) be done as a pandas DataFrame operation?
Is this what you're looking for?
Toy example:
test = np.arange(12).reshape(-1,2,3)
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
Solution
np.apply_over_axes(np.mean,test,0).reshape(test.shape[1],test.shape[2])
array([[3., 4., 5.],
[6., 7., 8.]])
And IIRC I think you're correct, pandas cannot really handle 3d tensors unless you mess about with Multi-indices, so personally I would rather deal with this operation in numpy first and then convert it to a dataframe. You can convert dataframes to numpy via to_numpy().
I'm trying to normalize an array within a range, e.g. [10,100]
But I also want to manually specify additional points in my result array, for example:
num = [1,2,3,4,5,6,7,8]
num_expected = [min(num), 5, max(num)]
expected_range = [10, 20, 100]
result_array = normalize(num, num_expected, expected_range)
Intended results:
Values from 1-5 are normalized to range (10,20].
5 in num array is mapped to 20 in expected range.
Values from 6-8 are normalized to range (20,100].
I know I can do it by normalizing the array twice, but I might have many additional points to add. I was wondering if there's any built-in function in numpy or scipy to do this?
I've checked MinMaxScaler in sklearn, but did not find the functionality I want.
Thanks!
Linear interpolation will do exactly what you want:
import scipy.interpolate
interp = scipy.interpolate.interp1d(num_expected, expected_range)
Then just pass numbers or arrays of numbers that you want to interpolate:
In [20]: interp(range(1, 9))
Out[20]:
array([ 10. , 12.5 , 15. , 17.5 ,
20. , 46.66666667, 73.33333333, 100. ])
I'd like to get an NxM matrix where numbers in each row are random samples generated from different normal distributions(same mean but different standard deviations). The following code works:
import numpy as np
mean = 0.0 # same mean
stds = [1.0, 2.0, 3.0] # different stds
matrix = np.random.random((3,10))
for i,std in enumerate(stds):
matrix[i] = np.random.normal(mean, std, matrix.shape[1])
However, this code is not quite efficient as there is a for loop involved. Is there a faster way to do this?
np.random.normal() is vectorized; you can switch axes and transpose the result:
np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T
print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]
That is, the scale parameter is the column-wise standard deviation, hence the need to transpose via .T since you want row-wise inputs.
How about this?
rows = 10000
stds = [1, 5, 10]
data = np.random.normal(size=(rows, len(stds)))
scaled = data * stds
print(np.std(scaled, axis=0))
Output:
[ 0.99417905 5.00908719 10.02930637]
This exploits the fact that a two normal distributions can be interconverted by linear scaling (in this case, multiplying by standard deviation). In the output, each column (second axis) will contain a normally distributed variable corresponding to a value in stds.
I have a large number (M) of time series, each with N time points, stored in an MxN matrix. Then I also have a separate time series with N time points that I would like to correlate with all the time series in the matrix.
An easy solution is to go through the matrix row by row and run numpy.corrcoef. However, I was wondering if there is a faster or more concise way to do this?
Let's use this correlation formula :
You can implement this for X as the M x N array and Y as the other separate time series array of N elements to be correlated with X. So, assuming X and Y as A and B respectively, a vectorized implementation would look something like this -
import numpy as np
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:,None]
B_mB = B - B.mean()
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum()
# Finally get corr coeff
out = np.dot(A_mA,B_mB.T).ravel()/np.sqrt(ssA*ssB)
# OR out = np.einsum('ij,j->i',A_mA,B_mB)/np.sqrt(ssA*ssB)
Verify results -
In [115]: A
Out[115]:
array([[ 0.1001229 , 0.77201334, 0.19108671, 0.83574124],
[ 0.23873773, 0.14254842, 0.1878178 , 0.32542199],
[ 0.62674274, 0.42252403, 0.52145288, 0.75656695],
[ 0.24917321, 0.73416177, 0.40779406, 0.58225605],
[ 0.91376553, 0.37977182, 0.38417424, 0.16035635]])
In [116]: B
Out[116]: array([ 0.18675642, 0.3073746 , 0.32381341, 0.01424491])
In [117]: out
Out[117]: array([-0.39788555, -0.95916359, -0.93824771, 0.02198139, 0.23052277])
In [118]: np.corrcoef(A[0],B), np.corrcoef(A[1],B), np.corrcoef(A[2],B)
Out[118]:
(array([[ 1. , -0.39788555],
[-0.39788555, 1. ]]),
array([[ 1. , -0.95916359],
[-0.95916359, 1. ]]),
array([[ 1. , -0.93824771],
[-0.93824771, 1. ]]))
import numpy as np
np.random.random(X) #where x is a positive integer
This gives me an array of X numbers on the interval (0 , 1). However, I want the numbers to be on the interval (-1 , 1) and I don't know how to scaled them in numpy. How can I do this very simply only using numpy?
You could simply use np.random.uniform:
>>> import numpy as np
>>> np.random.uniform(-1, 1, size=5)
array([-0.32235009, -0.8347222 , -0.83968268, 0.78546736, 0.399747 ])
Multiply the random values by 2, then subtract 1. This yields random values in the range -1 to 1.