Numpy statistics, creating an array with certain statistics properties - python

If I have some numpy array, I can measure its mean, median, standard deviation, and so on with numpy routines, http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
For example, for array arr, I would run
import numpy as np
print np.mean(arr) # prints the mean
print np.median(arr) # prints the median
However, for my purposes, instead of measuring the statistical properties after an array is created, I would like to create an array with data of distinct statistical properties.
So, for example, I would like to create an array shaped (1000,) of mean 2.5, variance 10, data points i.i.d. such that they are Gaussian draws, etc.
How could one do this with numpy?

You can use numpy.random.randn(size) which gives you normal(0,1) samples of length size. So multiply by the standard deviation and add the mean:
import numpy as np
m = 2.5
std = np.sqrt(10)
v = m + std*np.random.randn(1000)
print np.mean(v) # 2.43375955445
print np.var(v) # 9.9049376296

Yes,you can do this with numpy library
>>import numpy as np
>>import math
>>mean = 2.5
>>deviation = math.sqrt(10)
>>s = np.random.normal(mean,deviation, 1000)
It will give you 1000 Data points array which has mean value 2.5 and variance value 10.
For more information you can check this link http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.normal.html

Related

Get nearest coordinate in a 2D numpy array

I've seen many of the posts on how to get closes value in a numpy array, how to get closest coordinate in a 2D array etc. But none of them seem to solve what I am looking for.
The problem is, I have a 2D numpy array as such:
[[77.62881735 12.91172607]
[77.6464534 12.9230648]
[77.65330961 12.92020244]
[77.63142413 12.90909731]]
And I have one numpy array like this:
[77.64000112 12.91602265]
Now I want to find a coordinate in the 2D numpy array that is closest to the co-ordinates in 1D array.
That said, I am a beginner in these stuffs..So any input is appreciated.
I assume you mean euclidean distance. Try this:
a = np.array([[77.62881735, 12.91172607],
[77.6464534, 12.9230648],
[77.65330961,12.92020244],
[77.63142413 ,12.90909731]])
b = np.array([77.64000112, 12.91602265])
idx_min = np.sum( (a-b)**2, axis=1, keepdims=True).argmin(axis=0)
idx_min, a[idx_min]
Output:
(array([1], dtype=int64), array([[77.6464534, 12.9230648]]))
You need to implement your own "distance" computing function.
My example implements Euclidean Distance for simple
import numpy as np
import math
def compute_distance(coord1, coord2):
return math.sqrt(pow(coord1[0] - coord2[0], 2) + pow(coord1[1] - coord2[1], 2))
gallery = np.asarray([[77.62881735, 12.91172607],
[77.6464534, 12.9230648],
[77.65330961, 12.92020244],
[77.63142413, 12.90909731]])
query = np.asarray([77.64000112, 12.91602265])
distances = [compute_distance(i, query) for i in gallery]
min_coord = gallery[np.argmin(distances)]

How to find the difference of images in numpy arrays?

I'm trying to calculate the difference between 2 images. I'm expecting an integer as my result, but I'm not getting what I expect.
from imageio import imread
#https://raw.githubusercontent.com/glennford49/sampleImages/main/cat1.png
#https://raw.githubusercontent.com/glennford49/sampleImages/main/cat2.png
img1="cat1.png" # 183X276
img2="cat2.png" # 183x276
numpyImg1=[]
numpyImg2=[]
img1=imread(img1)
img2=imread(img2)
numpyImg1.append(img1)
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(abs(diff))
print("difference:",result)
print:
# it prints an array of images rather than printing an interger only
target:
difference: <int>
You are using Python's built-in sum function which only performs a summation along the first dimension of a NumPy array. This is the reason why you are getting a 2D array as the output instead of the single integer you expect. Please use numpy.sum on your result instead which will internally flatten a multi-dimensional NumPy array then sum over the results. In addition, you might as well use numpy.abs for the absolute computation too:
import numpy as np
result = np.sum(np.abs(diff))
Using numpy.sum means that you no longer need to reshape your array into a flattened representation prior to using the built-in sum function in your answer. For future development, always use NumPy methods on any arithmetic operations you want to perform on NumPy arrays. It prevents unexpected behaviour such as what you've just seen.
A (Colored) image is a 3D matrix, so what you can do is convert those image in numpy array using numpy.array(image) and then you can get the difference of those two numpy arrays.
The final answer will be an array in 3-dimenssion
I believe the dimension of numpy array is not 1, You need to perform the sum the number of times as the dimesion of the array to have a single sum value.
[1,2,3]
sum gives : 6
[[1,2,3],[1,2,3]]
sum gives : [2,4,6]
doing a second sum opertion gives
: 12 (single value)
you may need to add one more "sum(result)" before printing data (if image is 2 dimension) .
eg:
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(abs(diff))
result = sum(result) >> Repeat
print("difference:",result)
This is my answer of finding the difference of 2 images in rgb channels.
If 2 the same images were to be subtracted,
prints:
difference per pixel: 0
from numpy import sum
from imageio import imread
#https://github.com/glennford49/sampleImages/blob/main/cat2.png
#https://github.com/glennford49/sampleImages/blob/main/cat2.png
img1="cat1.png"
img2="cat2.png"
numpyImg1=[]
numpyImg2=[]
img1=imread(img1)
img2=imread(img2)
numpyImg1.append(img1)
numpyImg2.append(img2)
diff = numpyImg1[0] - numpyImg2[0]
result = sum(diff/numpyImg1[0].size)
result = sum(abs(result.reshape(-1)))
print("difference per pixel:",result)

Numpy array with different standard deviation per row

I'd like to get an NxM matrix where numbers in each row are random samples generated from different normal distributions(same mean but different standard deviations). The following code works:
import numpy as np
mean = 0.0 # same mean
stds = [1.0, 2.0, 3.0] # different stds
matrix = np.random.random((3,10))
for i,std in enumerate(stds):
matrix[i] = np.random.normal(mean, std, matrix.shape[1])
However, this code is not quite efficient as there is a for loop involved. Is there a faster way to do this?
np.random.normal() is vectorized; you can switch axes and transpose the result:
np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T
print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]
That is, the scale parameter is the column-wise standard deviation, hence the need to transpose via .T since you want row-wise inputs.
How about this?
rows = 10000
stds = [1, 5, 10]
data = np.random.normal(size=(rows, len(stds)))
scaled = data * stds
print(np.std(scaled, axis=0))
Output:
[ 0.99417905 5.00908719 10.02930637]
This exploits the fact that a two normal distributions can be interconverted by linear scaling (in this case, multiplying by standard deviation). In the output, each column (second axis) will contain a normally distributed variable corresponding to a value in stds.

Efficient two dimensional numpy array statistics

I have many 100x100 grids, is there an efficient way using numpy to calculate the median for every grid point and return just one 100x100 grid with the median values? Presently, I'm using a for loop to run through each grid point, calculating the median and then combining them into one grid at the end. I'm sure there's a better way to do this using numpy. Any help would be appreciated! Thanks!
Create as 100x100xN array (or stack together if that's not possible) and use np.median with the correct axis to do it in one go:
import numpy as np
a = np.random.rand(100,100)
b = np.random.rand(100,100)
c = np.random.rand(100,100)
d = np.dstack((a,b,c))
result = np.median(d,axis=2)
How many grids are there?
One option would be to create a 3D array that is 100x100xnumGrids and compute the median across the 3rd dimension.
use axis parameter of median:
import numpy as np
data = np.random.rand(100, 5, 5)
print np.median(data, axis=0)
print np.median(data[:, 0, 0])
print np.median(data[:, 1, 0])

Python - sparse vectors/distance calculation

I'm looking for dynamically growing vectors in Python, since I don't know their length in advance. In addition, I would like to calculate distances between these sparse vectors, preferably using the distance functions in scipy.spatial.distance (although any other suggestions are welcome). Any ideas how to do this? (Initially, it doesn't need to be efficient.)
Thanks a lot in advance!
You can use regular python lists (which are dynamic) as vectors. Trivial example follows.
from scipy.spatial.distance import sqeuclidean
a = [1,2,3]
b = [0,0,0]
print sqeuclidean(a,b) # 14
As per aganders3's suggestion, do note that you can also use numpy arrays if needed:
import numpy
a = numpy.array([1,2,3])
If the sparse part of your question is crucial I'd use scipy for that - it has support for sparse matrixes. You can define a 1xn matrix and use it as a vector. This works (the parameter is the size of the matrix, filled with zeroes by default):
sqeuclidean(scipy.sparse.coo_matrix((1,3)),scipy.sparse.coo_matrix((1,3))) # 0
There are many kinds of sparse matrixes, some dictionary based (see comment). You can define a row sparse matrix from a list like this:
scipy.sparse.csr_matrix([1,2,3])
Here is how you can do it in numpy:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([0, 0, 0])
c = np.sum(((a - b) ** 2)) # 14

Categories

Resources