Basically I have two arrays, one containing the values of the x-axis and the second containing the values of the y-axis. The problem is, when I do
plt.semilogy(out_samp,error_mc)
I get this
Which doesn't make any sense. That is because the plot functions plots everything as it encounters in the x array, not caring whether it's sorted in ascending order or not. How can I sort these two arrays so that the x array is sorted by increasing value and the y axis sorted in the same way so that the points are the same but the plot is connected so that it doesn't make this mess?
It is easier to zip, sort and unzip the two lists of data.
Example:
xs = [...]
ys = [...]
xs, ys = zip(*sorted(zip(xs, ys)))
plot(xs, ys)
See the zip documentation here: https://docs.python.org/3.5/library/functions.html#zip
Sort by the value of x-axis before plotting. Here is an MWE.
import itertools
x = [3, 5, 6, 1, 2]
y = [6, 7, 8, 9, 10]
lists = sorted(itertools.izip(*[x, y]))
new_x, new_y = list(itertools.izip(*lists))
# import operator
# new_x = map(operator.itemgetter(0), lists) # [1, 2, 3, 5, 6]
# new_y = map(operator.itemgetter(1), lists) # [9, 10, 6, 7, 8]
# Plot
import matplotlib.pylab as plt
plt.plot(new_x, new_y)
plt.show()
For small data, zip (as mentioned by other answerers) is enough.
new_x, new_y = zip(*sorted(zip(x, y)))
The result,
An alternative to sort the lists would be to use NumPy arrays and use np.sort() for sorting. The advantage with using arrays would be a vectorized operation while computing a function like y=f(x). Following is an example of plotting a normal distribution:
Without using sorted data
mu, sigma = 0, 0.1
x = np.random.normal(mu, sigma, 200)
f = 1/(sigma * np.sqrt(2 * np.pi)) *np.exp( - (x - mu)**2 / (2 * sigma**2) )
plt.plot(x,f, '-bo', ms = 2)
Output 1
With using np.sort() This allows straightforwardly using sorted array x while computing the normal distribution.
mu, sigma = 0, 0.1
x = np.sort(np.random.normal(mu, sigma, 200))
# or use x = np.random.normal(mu, sigma, 200).sort()
f = 1/(sigma * np.sqrt(2 * np.pi)) *np.exp( - (x - mu)**2 / (2 * sigma**2) )
plt.plot(x,f, '-bo', ms = 2)
Alternatively if you already have both x and y data unsorted, you may use numpy.argsort to sort them a posteriori
mu, sigma = 0, 0.1
x = np.random.normal(mu, sigma, 200)
f = 1/(sigma * np.sqrt(2 * np.pi)) *np.exp( - (x - mu)**2 / (2 * sigma**2) )
plt.plot(np.sort(x), f[np.argsort(x)], '-bo', ms = 2)
Notice that the code above uses sort() twice: first with np.sort(x) and then with f[np.argsort(x)]. The total sort() invocations can be reduced to one:
# once you have your x and f...
indices = np.argsort(x)
plt.plot(x[indices], f[indices], '-bo', ms = 2)
In both cases the output is
Output 2
just do this
list=zip(*sorted(zip(*(x,y))))
plt.plot(*list)
sorted function will sort according to the 1st argument i.e x values
I think you need to sort one array and the other array should also get sorted based on the first array. I got this solution from some other stack overflow question. Most probably this should be your solution.
out_samp,error_mc=zip(*sorted(zip(out_samp,error_mc)))
Now plot those two values, you get a correct graph.
You can convert your arrays to numpy arrays, then use argsort on the first array, take the the array and sort both arrays with the argsort array.
Related
I have data array, with shape 100x100. I want to divide it into 5x5 blocks, and each block has 20x20 grids. The value of each block I want is the sum of all values in it.
Is there a more elegant way to accomplish it?
x = np.arange(100)
y = np.arange(100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
Z_new = np.zeros((5, 5))
for i in range(5):
for j in range(5):
Z_new[i, j] = np.sum(Z[i*20:20+i*20, j*20:20+j*20])
This is based on index, how if based on x?
x = np.linspace(0, 1, 100)
y = np.linspace(0, 1, 100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
x_new = np.linspace(0, 1, 15)
y_new = np.linspace(0, 1, 15)
Z_new?
Simply reshape splitting each of those two axes into two each with shape (5,20) to form a 4D array and then sum reduce along the axes having the lengths 20, like so -
Z_new = Z.reshape(5,20,5,20).sum(axis=(1,3))
Functionally the same, but potentially faster option with np.einsum -
Z_new = np.einsum('ijkl->ik',Z.reshape(5,20,5,20))
Generic block size
Extending to a generic case -
H,W = 5,5 # block-size
m,n = Z.shape
Z_new = Z.reshape(H,m//H,W,n//W).sum(axis=(1,3))
With einsum that becomes -
Z_new = np.einsum('ijkl->ik',Z.reshape(H,m//H,W,n//W))
To compute average/mean across blocks, use mean instead of sum method.
Generic block size and reduction operation
Extending to use reduction operations that have ufuncs supporting multiple axes parameter with axis for reductions, it would be -
def blockwise_reduction(a, height, width, reduction_func=np.sum):
m,n = a.shape
a4D = a.reshape(height,m//height,width,n//width)
return reduction_func(a4D,axis=(1,3))
Thus, to solve our specific case, it would be :
blockwise_reduction(Z, height=5, width=5)
and for a block-wise average computation, it would be -
blockwise_reduction(Z, height=5, width=5, reduction_func=np.mean)
You can do following.
t = np.eye(5).repeat(20, axis=1)
Z_new = t.dot(Z).dot(t.T)
This is correct because Z_new[i, j] = t[i, k] * Z[k, l] * t[j, l]
Also this seems faster than Divakar's solution.
Such a problem is a very good candidate for a function like scipy.ndimage.measurements.sum since it allows "grouping" and "labelling" terms. You will have what you want with something like:
labels = [[20*(y//5) + x//5 for x in range(100)] for y in range(100)]
s = scipy.ndimage.measurements.sum(Z, labels, range(400))
(Not tested, but that is the idea).
I have a NumPy array with equations solved symbolically, with constants a and b. Here's an example of the cell at index (2,0) in my array "bounds_symbolic":
-a*sqrt(1/(a**6*b**2+1))
I also have an array, called "a_values", that I would like to substitute into my "bounds_symbolic" array. I also have the b-value set to 1, which I would also like to substitute in. Keeping the top row of the arrays intact would also be nice.
In other words, for the cell indexed at (2,0) in "bounds_symbolic", I want to substitute all of my a and b-values into the equation, while extending the column to contain the substituted equations. I then want to do this operation for the entirety of the "bounds_symbolic" array.
Here is the code that I have so far:
import sympy
import numpy as np
a, b, x, y = sympy.symbols("a b x y")
# Equation of the ellipse solved for y
ellipse = sympy.sqrt((b ** 2) * (1 - ((x ** 2) / (a ** 2))))
# Functions to be tested
test_functions = np.array(
[(a * b * x), (((a * b) ** 2) * x), (((a * b) ** 3) * x), (((a * b) ** 4) * x), (((a * b) ** 5) * x)])
# Equating ellipse and test_functions so their intersection can be symbolically solved for
equate = np.array(
[sympy.Eq(ellipse, test_functions[0]), sympy.Eq(ellipse, test_functions[1]), sympy.Eq(ellipse, test_functions[2]),
sympy.Eq(ellipse, test_functions[3]), sympy.Eq(ellipse, test_functions[4])])
# Calculating the intersection points of the ellipse and the testing functions
# Array that holds the bounds of the integral solved symbolically
bounds_symbolic = np.array([])
for i in range(0, 5):
bounds_symbolic = np.append(bounds_symbolic, sympy.solve(equate[i], x))
# Array of a-values to plug into the bounds of the integral
a_values = np.array(np.linspace(-10, 10, 201))
# Setting b equal to a constant of 1
b = 1
integrand = np.array([])
for j in range(0, 5):
integrand = np.append(integrand, (ellipse - test_functions[j]))
# New array with a-values substituted into the bounds
bounds_a = bounds_symbolic
# for j in range(0, 5):
# bounds_a = np.append[:, ]
Thank you!
Numpy arrays are the best choice when working with pure numerical data, for which they can help speed up many types of calculations. Once you start mixing sympy expressions, things can get very messy. You'll also lose all the speed advantages of numpy arrays.
Apart from that, np.append is a very slow operation as it needs to recreate the complete array every time it is executed. When creating a new numpy array, the recommended way it to first create an empty array (e.g. with np.zeros()) already with its final size.
You should also check out Python's list comprehension as it eases the creation of lists. In "pythonic" code, indices are used as little as possible. List comprehension may look a bit weird when you are used to other programming languages, but you quickly get used to them, and from then on you'll certainly prefer them.
In your example code, numpy is useful for the np.linspace command, which creates an array of numbers (again converting them with np.array isn't necessary). And at the end, you might want to convert the substituted values to a numpy array. Note that this won't work when solve would return a different number of solutions for some of the equations, as numpy arrays need an equal size for all its elements. Also note that an explicit conversion from sympy's numerical type to a dtype understood by numpy might be needed. (Sympy often works with higher precision, not caring for the loss of speed.)
Also note that if you assign b = 1, you create a new variable and lose the variable pointing to the sympy symbol. It's recommended to use another name. Just writing b = 1 will not change the value of the symbol. You need subs to substitute symbols with values.
Summarizing, your code could look like this:
import sympy
import numpy as np
a, b, x, y = sympy.symbols("a b x y")
# Equation of the ellipse solved for y
ellipse = sympy.sqrt((b ** 2) * (1 - ((x ** 2) / (a ** 2))))
# Functions to be tested
test_functions = [a * b * x, ((a * b) ** 2) * x, ((a * b) ** 3) * x, ((a * b) ** 4) * x, ((a * b) ** 5) * x]
# Equating ellipse and test_functions so their intersection can be symbolically solved for
# Array that holds the bounds of the integral solved symbolically
bounds_symbolic = [sympy.solve(sympy.Eq(ellipse, fun), x) for fun in test_functions]
# Array of a-values to plug into the bounds of the integral
a_values = np.linspace(-10, 10, 201)
# Setting b equal to a constant of 1
b_val = 1
# New array with a-values substituted into the bounds
bounds_a = [[[bound.subs({a: a_val, b: b_val}) for bound in bounds]
for bounds in bounds_symbolic]
for a_val in a_values]
bounds_a = np.array(bounds_a, dtype='float') # shape: (201, 5, 2)
The values of the resulting array can for example be used for plotting:
import matplotlib.pyplot as plt
for i, (test_func, color) in enumerate(zip(test_functions, plt.cm.Set1.colors)):
plt.plot(a_values, bounds_a[:, i, 0], color=color, label=test_func)
plt.plot(a_values, bounds_a[:, i, 1], color=color, alpha=0.5)
plt.legend()
plt.margins(x=0)
plt.xlabel('a')
plt.ylabel('bounds')
plt.show()
Or filled:
for i, (test_func, color) in enumerate(zip(test_functions, plt.cm.Set1.colors)):
plt.plot(a_values, bounds_a[:, i, :], color=color)
plt.fill_between(a_values, bounds_a[:, i, 0], bounds_a[:, i, 1], color=color, alpha=0.1)
In my simulation i compute multiple values for a phase, for example
phi = np.linspace(-N,N,1000)
where N can be large.
Is there an easy way to map the values to the intervall [0,2pi) ?
Does that work ?
import numpy as np
import math
N=10
phi = np.linspace(-N,N,1000)
phi = phi%(2*math.pi)
print(phi)
Output
[2.56637061 2.58639063 ... 3.69679467 3.71681469]
It sounds like you are looking for np.interp. Scipy offers an interpolate function too.
For a usage example, to map the values of phi (which are between -N and N) to [0, 2π] try
np.interp(phi, (-N, N), (0, 2*np.pi))
To exclude 2π you could either change upper bound so no value maps onto 2π.
np.interp(phi, (-N, N + 1), (0, 2*np.pi))
Or reduce the largest value you include in phi
phi = np.linspace(-N, N, 1000, endpoint=False)
I believe it would be easier to just ask for the values directly.
For example, 1000 points over the range [0, 2π] can be given by
np.linspace(0, 2*np.pi, 1000)
And for the range [0, 2π) which excludes the value 2π
np.linspace(0, 2*np.pi, 1000, endpoint=False)
Another possible way:
First, convert to the [-pi, pi] interval using np.arctan2(np.sin(angle), np.cos(angle)). Then, you still need to transform the negative values. A final function like this would work:
def convert_angle_to_0_2pi_interval(angle):
new_angle = np.arctan2(np.sin(angle), np.cos(angle))
if new_angle < 0:
new_angle = abs(new_angle) + 2 * (np.pi - abs(new_angle))
return new_angle
To confirm, run:
angles = [10, 200, 365, -10, -180]
print(np.rad2deg([convert_angle_to_0_2pi_interval(np.deg2rad(a)) for a in angles]))
...which prints: [ 10., 200., 5., 350., 180.]
I have data array, with shape 100x100. I want to divide it into 5x5 blocks, and each block has 20x20 grids. The value of each block I want is the sum of all values in it.
Is there a more elegant way to accomplish it?
x = np.arange(100)
y = np.arange(100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
Z_new = np.zeros((5, 5))
for i in range(5):
for j in range(5):
Z_new[i, j] = np.sum(Z[i*20:20+i*20, j*20:20+j*20])
This is based on index, how if based on x?
x = np.linspace(0, 1, 100)
y = np.linspace(0, 1, 100)
X, Y = np.meshgrid(x, y)
Z = np.cos(X)*np.sin(Y)
x_new = np.linspace(0, 1, 15)
y_new = np.linspace(0, 1, 15)
Z_new?
Simply reshape splitting each of those two axes into two each with shape (5,20) to form a 4D array and then sum reduce along the axes having the lengths 20, like so -
Z_new = Z.reshape(5,20,5,20).sum(axis=(1,3))
Functionally the same, but potentially faster option with np.einsum -
Z_new = np.einsum('ijkl->ik',Z.reshape(5,20,5,20))
Generic block size
Extending to a generic case -
H,W = 5,5 # block-size
m,n = Z.shape
Z_new = Z.reshape(H,m//H,W,n//W).sum(axis=(1,3))
With einsum that becomes -
Z_new = np.einsum('ijkl->ik',Z.reshape(H,m//H,W,n//W))
To compute average/mean across blocks, use mean instead of sum method.
Generic block size and reduction operation
Extending to use reduction operations that have ufuncs supporting multiple axes parameter with axis for reductions, it would be -
def blockwise_reduction(a, height, width, reduction_func=np.sum):
m,n = a.shape
a4D = a.reshape(height,m//height,width,n//width)
return reduction_func(a4D,axis=(1,3))
Thus, to solve our specific case, it would be :
blockwise_reduction(Z, height=5, width=5)
and for a block-wise average computation, it would be -
blockwise_reduction(Z, height=5, width=5, reduction_func=np.mean)
You can do following.
t = np.eye(5).repeat(20, axis=1)
Z_new = t.dot(Z).dot(t.T)
This is correct because Z_new[i, j] = t[i, k] * Z[k, l] * t[j, l]
Also this seems faster than Divakar's solution.
Such a problem is a very good candidate for a function like scipy.ndimage.measurements.sum since it allows "grouping" and "labelling" terms. You will have what you want with something like:
labels = [[20*(y//5) + x//5 for x in range(100)] for y in range(100)]
s = scipy.ndimage.measurements.sum(Z, labels, range(400))
(Not tested, but that is the idea).
I can generate Gaussian data with random.gauss(mu, sigma) function, but how can I generate 2D gaussian? Is there any function like that?
If you can use numpy, there is numpy.random.multivariate_normal(mean, cov[, size]).
For example, to get 10,000 2D samples:
np.random.multivariate_normal(mean, cov, 10000)
where mean.shape==(2,) and cov.shape==(2,2).
I'd like to add an approximation using exponential functions. This directly generates a 2d matrix which contains a movable, symmetric 2d gaussian.
I should note that I found this code on the scipy mailing list archives and modified it a little.
import numpy as np
def makeGaussian(size, fwhm = 3, center=None):
""" Make a square gaussian kernel.
size is the length of a side of the square
fwhm is full-width-half-maximum, which
can be thought of as an effective radius.
"""
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
if center is None:
x0 = y0 = size // 2
else:
x0 = center[0]
y0 = center[1]
return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
For reference and enhancements, it is hosted as a gist here. Pull requests welcome!
Since the standard 2D Gaussian distribution is just the product of two 1D Gaussian distribution, if there are no correlation between the two axes (i.e. the covariant matrix is diagonal), just call random.gauss twice.
def gauss_2d(mu, sigma):
x = random.gauss(mu, sigma)
y = random.gauss(mu, sigma)
return (x, y)
import numpy as np
# define normalized 2D gaussian
def gaus2d(x=0, y=0, mx=0, my=0, sx=1, sy=1):
return 1. / (2. * np.pi * sx * sy) * np.exp(-((x - mx)**2. / (2. * sx**2.) + (y - my)**2. / (2. * sy**2.)))
x = np.linspace(-5, 5)
y = np.linspace(-5, 5)
x, y = np.meshgrid(x, y) # get 2D variables instead of 1D
z = gaus2d(x, y)
Straightforward implementation and example of the 2D Gaussian function. Here sx and sy are the spreads in x and y direction, mx and my are the center coordinates.
Numpy has a function to do this. It is documented here. Additionally to the method proposed above it allows to draw samples with arbitrary covariance.
Here is a small example, assuming ipython -pylab is started:
samples = multivariate_normal([-0.5, -0.5], [[1, 0],[0, 1]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
samples = multivariate_normal([0.5, 0.5], [[0.1, 0.5],[0.5, 0.6]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
In case someone find this thread and is looking for somethinga little more versatile (like I did), I have modified the code from #giessel. The code below will allow for asymmetry and rotation.
import numpy as np
def makeGaussian2(x_center=0, y_center=0, theta=0, sigma_x = 10, sigma_y=10, x_size=640, y_size=480):
# x_center and y_center will be the center of the gaussian, theta will be the rotation angle
# sigma_x and sigma_y will be the stdevs in the x and y axis before rotation
# x_size and y_size give the size of the frame
theta = 2*np.pi*theta/360
x = np.arange(0,x_size, 1, float)
y = np.arange(0,y_size, 1, float)
y = y[:,np.newaxis]
sx = sigma_x
sy = sigma_y
x0 = x_center
y0 = y_center
# rotation
a=np.cos(theta)*x -np.sin(theta)*y
b=np.sin(theta)*x +np.cos(theta)*y
a0=np.cos(theta)*x0 -np.sin(theta)*y0
b0=np.sin(theta)*x0 +np.cos(theta)*y0
return np.exp(-(((a-a0)**2)/(2*(sx**2)) + ((b-b0)**2) /(2*(sy**2))))
We can try just using the numpy method np.random.normal to generate a 2D gaussian distribution.
The sample code is np.random.normal(mean, sigma, (num_samples, 2)).
A sample run by taking mean = 0 and sigma 20 is shown below :
np.random.normal(0, 20, (10,2))
>>array([[ 11.62158316, 3.30702215],
[-18.49936277, -11.23592946],
[ -7.54555371, 14.42238838],
[-14.61531423, -9.2881661 ],
[-30.36890026, -6.2562164 ],
[-27.77763286, -23.56723819],
[-18.18876597, 41.83504042],
[-23.62068377, 21.10615509],
[ 15.48830184, -15.42140269],
[ 19.91510876, 26.88563983]])
Hence we got 10 samples in a 2d array with mean = 0 and sigma = 20