I have written a piece of code to determine the probability of getting a value Y when X dice are thrown. The equation I use contains a sum over range 0 up-to-and-including Y.
I want to plot the values on a grid (number of dice - sum of values) and have created a grid.
import scipy.special
from matplotlib import pyplot as plt
from matplotlib import cm as cm
import numpy as np
from numpy import exp,arange
def probability_of_sum(sum, dice, faces):
# Give different names.
X = sum
Y = dice
Z = faces
# Calculations, calculations, calculations.
# Dummy values.
wanted_possibilities = 1
total_possibilities = 2
probability = wanted_possibilities/total_possibilities
return {'wanted_possibilities':wanted_possibilities, 'total_possibilities':total_possibilities, 'probability':probability}
# Main part of the function.
# Consider 1 to 8 dice
# Get probabilities for lowest possible sum (8) to highest possible sum (48)
dice_range = np.linspace(1, 8, num=8)
value_range = np.linspace(1, 8*6, num=8*6)
X,Y = np.meshgrid( dice_range, value_range)
# Calculate value for each grid point separately using VECTORIZE.
value_prob = np.vectorize(probability_of_sum)( Y, X, 6)
The output of my function is now a 2D array of dictionaries (below are dummy values):
[[{'Wanted_possibilities': 0, 'total_possibilities':1, 'probability':0}
{'Wanted_possibilities': 1, 'total_possibilities':1, 'probability':1}]
[{'Wanted_possibilities': 0, 'total_possibilities':2, 'probability':0}
{'Wanted_possibilities': 1, 'total_possibilities':2, 'probability':0.5}]
[{'Wanted_possibilities': 0, 'total_possibilities':3, 'probability':0}
{'Wanted_possibilities': 1, 'total_possibilities':3, 'probability':0.33}]
[{'Wanted_possibilities': 0, 'total_possibilities':4, 'probability':0}
{'Wanted_possibilities': 1, 'total_possibilities':4, 'probability':0.25}] ]
How do I continue from here to plot the output ('probability') as a 2D function of X and Y?
value_prob['probability']
gives me the error
IndexError: only integers, slices(:), ellipsis (...), numpy.newaxis(None) and integer or boolean arrays are valid indices.
I can access individual elements by using
(value_prob[1,1])['probability']
but then I would need a loop to plot the function. Is there a more general/powerful/cleaner way of accessing this?
EDIT: I would like to plot 'probability' against (dice, value).
You can try this:
import matplotlib.pyplot as plt
import itertools
import numpy as np
s = [[{'Wanted_possibilities': 0, 'total_possibilities':1, 'probability':0},
{'Wanted_possibilities': 1, 'total_possibilities':1, 'probability':1}],
[{'Wanted_possibilities': 0, 'total_possibilities':2, 'probability':0},
{'Wanted_possibilities': 1, 'total_possibilities':2, 'probability':0.5}],
[{'Wanted_possibilities': 0, 'total_possibilities':3, 'probability':0},
{'Wanted_possibilities': 1, 'total_possibilities':3, 'probability':0.33}],
[{'Wanted_possibilities': 0, 'total_possibilities':4, 'probability':0},
{'Wanted_possibilities': 1, 'total_possibilities':4, 'probability':0.25}] ]
final_data = list(itertools.chain(*[[[i['Wanted_possibilities'], i['total_possibilities'], i['probability']] for i in b] for b in s]))
plt.bar(np.arange(len([i[-1] for i in final_data])), [i[0] for i in final_data], 0.60, yerr = [i[1] for i in final_data])
plt.ylabel('Probabilities')
plt.xticks(np.arange(len([i[-1] for i in final_data])), tuple(map(str, [i[-1] for i in final_data])))
plt.show()
Related
I have a list named "y" with 8 numpy arrays of the shape (180000,)
Now I want to create a new numpy array named "Collision" with the same shape that counts how many values of y are not 0. See the following example:
import numpy as np
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
The calculation of this function takes a relatively long time. Is there a faster implementation to do this?
I am not sure why your calculation takes so long, hope this helps to clarify, for example your list of array is like this:
import numpy as np
y = [np.random.normal(0,1,180000) for i in range(8)]
Running your code, it works ok:
collisions = np.zeros(len(y[0]), dtype=np.uint8)
for yi in y:
collisions[np.where(yi > 0)] += 1
collisions
array([4, 2, 4, ..., 4, 4, 5], dtype=uint8)
You can do it a bit faster like this, basically making your list of arrays a matrix and doing a row sum of >0, but I don't see the problem with that above:
(np.array(y)>0).sum(axis=0)
array([4, 2, 4, ..., 4, 4, 5])
I'm assuming you're looking for something like this:
import numpy as np
# simulating your data by randomly generating numbers in [-0.5, 0.5)
y = np.random.rand(8, 180_000) - 0.5
print(y.shape) # (8, 180000)
collisions = np.sum(y > 0, axis=0, dtype=np.uint8)
print(collisions.shape) # (180000,)
print(collisions) # [4 4 4 ... 1 6 7]
I have to plot many plots in the same graph. The x values is the same array for all and it is an array from 0 to N. The Y values for each plot are arrays that start with 0 and start having positive values at different x, depending on the plot.
EXAMPLE:
x = np.arange(100)
y1 = [0, 0, 10, 12 , 53, ... , n]
y2 = [0, 0, 0, 12 , 53, ... , n]
y3 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 67, 53, ... , n]
when I plot there is a vertical line that goes from the bottom to the first positive value for Y. In the case of y1, there is line from (1, 0) to (2, 10) that is the line i want to avoid and just plot from (2, 10).
I know I can create new arrays for x and y to match the conditions I want, but I really need to know if there is other way.
There is an image with one example of my current plot.
Link of image
CODE:
import pandas as pd
import numpy as np
import xlrd
import matplotlib.pyplot as plt
# This is a excel where a user types a number, this number will be the number
of months.
workbook = xlrd.open_workbook('INPUT.xlsx')
sheet1 = workbook.sheet_by_name('ASSUMPTIONS')
Num_Meses = np.array([i for i in range(int(sheet1.cell(5, 5).value) + 1)])
# Then I create a dictonary from which I take the arrays, (YPP, Y1P, Y2P)
are type 'numpy.ndarray'
filt = df['WELL TYPE'] == 'PP'
YPP = df.loc[filt, 'OIL PRODUCTION'][0]
filt = df['WELL TYPE'] == '1P'
Y1P = df.loc[filt, 'OIL PRODUCTION'][0] + YPP
filt = df['WELL TYPE'] == '2P'
Y2P = df.loc[filt, 'OIL PRODUCTION'][0] + Y1P
filt = df['WELL TYPE'] == '3P'
Y3P = df.loc[filt, 'OIL PRODUCTION'][0] + Y2P
plt.plot(Num_Meses, Y3P, label='3P')
plt.plot(Num_Meses, Y2P, label='2P')
plt.plot(Num_Meses, Y1P, label='1P')
plt.plot(Num_Meses, YPP, label='PP', color='k')
A test code for this type of data:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,1,20)
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])
n = np.size(x)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[max(y),mean,sigma])
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
I need to fit lots of data which is just like the y array given above to a Gaussian distribution.
Using the standard gaussian fitting routine using scipy.optimize gives this kind of fit:
I have tried many different initial values, but cannot get any kind of fit.
Does anyone have any ideas how I could get this data fitted to a Gaussian?
Thanks
The problem
Your fundamental problem is that you have a severely undetermined fitting problem. Think about it like this: you have three unknowns but only one datapoint. This is akin to solving for x, y, z when you only have one equation. Because the height of your gaussian can vary independently of it's width, there are infinitely many distributions, all with different widths that will satisfy the constraints of your fit.
More directly, your a and sigma parameters can both change the maximum height of the distribution, which is pretty much the only thing that matters in terms of achieving a good fit (at least once the distribution is centered and fairly narrow). Thus, the fitting routines in Scipy can't figure which to change at any given step.
The fix
The simplest way to solve the problem is to lock down one of your parameters. You don't need to change your equation, but you do need to make at least one of a, x0, or sigma a constant. The best choice of parameter to fix is probably x0, since it's trivial to determine the mean/median/mode of you data by just getting the x coordinate of the one datapoint that is non-zero in y. You'll also need to get a little more clever about how you set your initial guesses. Here's what that looks like:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,1,20)
xdiff = x[1] - x[0]
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# the mean/median/mode all occur at the x coordinate of the one datapoint that is non-zero in y
mean = x[np.argmax(y)]
# sigma should be tiny, since we want a narrow distribution
sigma = xdiff
# the scaling factor should be roughly equal to the "height" of the one datapoint
a = y.max()
def gaus(x,a,sigma):
return a*np.exp(-(x-mean)**2/(2*sigma**2))
bounds = ((1, .015), (20, 1))
popt,pcov = curve_fit(gaus, x, y, p0=[a, sigma], maxfev=20000, bounds=bounds)
residual = ((gaus(x,*popt) - y)**2).sum()
plt.figure(figsize=(8,6))
plt.plot(x,y,'b+:',label='data')
xdist = np.linspace(x.min(), x.max(), 1000)
plt.plot(xdist,gaus(xdist,*popt),'C0', label='fit distribution')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.text(.1,6,"residual: %.6e" % residual)
plt.legend()
plt.show()
Output:
The better fix
You don't need a fit to get the kind of Gaussians you want. You can instead use a simple closed form expression to calculate the parameters that you need, as in the fitonegauss function in the code below:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def gauss(x, a, mean, sigma):
return a*np.exp(-(x - mean)**2/(2*sigma**2))
def fitonegauss(x, y, fwhm=None):
if fwhm is None:
# determine full width at half maximum from the spacing between the x points
fwhm = (x[1] - x[0])
# the mean/median/mode all occur at the x coordinate of the one datapoint that is non-zero in y
mean = x[np.argmax(y)]
# solve for sigma in terms of the desired full width at half maximum
sigma = fwhm/(2*np.sqrt(2*np.log(2)))
# max(pdf) == 1/(np.sqrt(2*np.pi)*sigma). Use that to determine a
a = y.max() #(np.sqrt(2*np.pi)*sigma)
return a, mean, sigma
N = 20
x = np.linspace(0,1,N)
y = np.zeros(N)
y[N//2] = 10
popt = fitonegauss(x, y)
plt.figure(figsize=(8,6))
plt.plot(x,y,'b+:',label='data')
xdist = np.linspace(x.min(), x.max(), 1000)
plt.plot(xdist,gauss(xdist,*popt),'C0', label='fit distribution')
residual = ((gauss(x,*popt) - y)**2).sum()
plt.plot(x, gauss(x,*popt),'ro:',label='fit')
plt.text(.1,6,"residual: %.6e" % residual)
plt.legend()
plt.show()
Output:
The advantages of this approach are many. It's far more computationally efficient than any fit could be, it will (for the most part) never fail, and it gives you far more control over the actual width of the distribution that you end up with.
The fitonegauss function is set up so that you can directly set the full width at half maximum of the fitted distribution. If you leave it unset, the code will automatically guess it from the spacing of the x data. This seems to produce reasonable results for your application.
Don't use a general "a" parameter, use the proper normal distribution equation instead:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,1,20)
y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0])
n = np.size(x)
mean = sum(x*y)/n
sigma = np.sqrt(sum(y*(x-mean)**2)/n)
def gaus(x, x0, sigma):
return 1/np.sqrt(2 * np.pi * sigma**2)*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[mean,sigma])
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
I have written a code that plots random walks. There are traj different random walks generated and each consists of n steps. I would like to animate their moves. How can I do that?
My code below:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def random_walk_2D(n, traj = 1):
for i in range(traj):
skoki = np.array([[0, 1], [1, 0], [-1, 0], [0, -1]])
losy = np.random.randint(4, size = n)
temp = skoki[losy, :]
x = np.array([[0, 0]])
temp1 = np.concatenate((x, temp), axis = 0)
traj = np.cumsum(temp1, axis = 0)
plt.plot(traj[:, 0], traj[:, 1])
plt.plot(traj[-1][0], traj[-1][1], 'ro') #the last point
plt.show()
As it stands now, you generate traj in one shot. I mean that traj in traj = np.cumsum(temp1, axis = 0) already contains all the "story" from the beginning to the end. If you want to create an animation that is in "real time", you should not generate traj in one shot, but iteratively, plotting new steps as they come. What about doing:
import numpy as np
import matplotlib.pyplot as plt
def real_time_random_walk_2D_NT(
nb_steps, nb_trajs, with_dots=False, save_trajs=False, tpause=.01
):
"""
Parameters
----------
nb_steps : integer
number of steps
nb_trajs : integer
number of trajectories
save_trajs : boolean (optional)
If True, entire trajectories are saved rather than
saving only the last steps needed for plotting.
False by default.
with_dots : boolean (optional)
If True, dots representative of random-walking entities
are displayed. Has precedence over `save_trajs`.
False by default.
tpause : float (optional)
Pausing time between 2 steps. .01 secondes by default.
"""
skoki = np.array([[0, 1], [1, 0], [-1, 0], [0, -1]])
trajs = np.zeros((nb_trajs, 1, 2))
for i in range(nb_steps):
_steps = []
for j in range(nb_trajs):
traj = trajs[j,:,:]
losy = np.random.randint(4, size = 1)
temp = skoki[losy, :]
traj = np.concatenate((traj, temp), axis = 0)
traj[-1,:] += traj[-2,:]
_steps.append(traj)
if save_trajs or with_dots:
trajs = np.array(_steps)
if with_dots:
plt.cla()
plt.plot(trajs[:,i, 0].T, trajs[:,i, 1].T, 'ro') ## There are leeway in avoiding these costly transpositions
plt.plot(trajs[:,:i+1, 0].T, trajs[:,:i+1, 1].T)
else:
plt.plot(trajs[:,-1+i:i+1, 0].T, trajs[:,-1+i:i+1, 1].T)
else:
trajs = np.array(_steps)[:,-2:,:]
plt.plot(trajs[:,:, 0].T, trajs[:,:, 1].T)
plt.pause(tpause)
real_time_random_walk_2D_NT(50, 6, with_dots=True)
real_time_random_walk_2D_NT(50, 6)
What is the easiest and fastest way to interpolate between two arrays to get new array.
For example, I have 3 arrays:
x = np.array([0,1,2,3,4,5])
y = np.array([5,4,3,2,1,0])
z = np.array([0,5])
x,y corresponds to data-points and z is an argument. So at z=0 x array is valid, and at z=5 y array valid. But I need to get new array for z=1. So it could be easily solved by:
a = (y-x)/(z[1]-z[0])*1+x
Problem is that data is not linearly dependent and there are more than 2 arrays with data. Maybe it is possible to use somehow spline interpolation?
This is a univariate to multivariate regression problem. Scipy supports univariate to univariate regression, and multivariate to univariate regression. But you can instead iterate over the outputs, so this is not such a big problem. Below is an example of how it can be done. I've changed the variable names a bit and added a new point:
import numpy as np
from scipy.interpolate import interp1d
X = np.array([0, 5, 10])
Y = np.array([[0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 0],
[8, 6, 5, 1, -4, -5]])
XX = np.array([0, 1, 5]) # Find YY for these
YY = np.zeros((len(XX), Y.shape[1]))
for i in range(Y.shape[1]):
f = interp1d(X, Y[:, i])
for j in range(len(XX)):
YY[j, i] = f(XX[j])
So YY are the result for XX. Hope it helps.