I'm trying to optimise several functions using the brute-force method of lmfit (based on scipy minimize). The function I'm minimizing can have a variable number of parameters passed into it (each parameter with variable optimisation range)
I've made a simple example to demonstrate.
import numpy as np
from matplotlib import pyplot as plt
import lmfit
def my_fun(param): # function to be optimised
return -1. * (.1 * param['a']**2 + 2. * param['b'] - 5. * \
param['c']**0.5 - param['d'] + param['e'])
def brute_wrapper(optimiser_parameters):
""" so I can optimise my_fun() across any parameter set """
initial = {'a': 1., 'b': 2., 'c': 3., 'd': 4., 'e': 5.}
parameters = optimiser_parameters.valuesdict()
for key in initial.keys(): # replace parameters established in optimiser
if key in parameters.keys():
initial[key] = parameters[key]
return my_fun(initial) # fitness indicator
I can plot the results easily if I'm only varying two parameters, like so:
# calculating and plotting for 2
optimisers = lmfit.Parameters()
optimisers.add("b", min=1, max=5, brute_step=1)
optimisers.add("e", min=5, max=11, brute_step=1)
brute = lmfit.minimize(brute_wrapper, optimisers, method='brute')
fig, ax = plt.subplots(1)
x, y = brute.brute_grid
value = -1 * np.array(brute.brute_Jout)
image = ax.pcolormesh(x, y, value)
fig.colorbar(image)
ax.set_xlabel(brute.var_names[0])
ax.set_ylabel(brute.var_names[1])
plt.show()
but for 3 or more plots I'd like to grid up the heatmaps (one plot for each pairing (b~c, b~d, b~e, ..., d~e)) but without doubling up (see example at the end).
# calculating and plotting for 4
optimisers = lmfit.Parameters()
optimisers.add("b", min=1, max=5, brute_step=1)
optimisers.add("c", min=2, max=8, brute_step=1)
optimisers.add("d", min=1, max=6, brute_step=1)
optimisers.add("e", min=5, max=11, brute_step=1)
brute = lmfit.minimize(brute_wrapper, optimisers, method='brute')
# how to structure data for plot?
I tried to use corner.corner and dissected some code from plot_mcmc() in scipy with no luck.
How do I deconstruct the data from brute and make such a plot?
I made a crappy picture to show what I mean (universal colour bar is a fool's hope)
To permute over the pairs of variables you can use the itertools.combinations() on the variable names (or indices if you prefer).
>>>list(itertools.combinations(brute.var_names, 2))
[('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')]
This will give you the x, y variable pairs for each plot.
You will also need to specify the matplotlib.pyplot.subplot position index for each plot
i.e. for 4 parameters the lower diagonal of the 3x3 grid with positions.
(1) (2) (3)
(4) (5) (6)
(7) (8) (9)
To match the order of the combinations() output above you need the numbers in the order [1, 4, 7, 5, 8, 9]
You can get this with with something like this...
def get_tril_positions(n):
"""Lower square triangular position"""
tril_pos = np.tril((np.arange(n**2) + 1).reshape(n, -1)).T.ravel()
return tril_pos[tril_pos != 0]
Where n is the number of parameters minus 1.
Assuming that for the x-y parameters you would like the minimum residual values (letting all other parameters vary) then you can collapse the brute_grid and the brute_Jout along the other axis using np.amin().
Now with the N-d array collapsed into a 2-d array you can plot it "normally" like a 2-d array.
Combing the above together I get something like.
from itertools import combinations
n = len(brute.var_names) - 1
combos = list(combinations(brute.var_names, 2))
positions = get_tril_positions(n)
for (xname, yname), pos in zip(combos, positions):
# Specify subplot
ax = plt.subplot(n, n, pos)
# Find index for these variables
xi = brute.var_names.index(xname)
yi = brute.var_names.index(yname)
# get the meshgrids for x and y
X = brute.brute_grid[xi]
Y = brute.brute_grid[yi]
# Find other axis to collapse.
axes = tuple([ii for ii in range(brute.brute_Jout.ndim) if ii not in (xi, yi)])
# Collapse to minimum Jout
min_jout = np.amin(brute.brute_Jout, axis=axes)
min_xgrid = np.amin(X, axis=axes)
min_ygrid = np.amin(Y, axis=axes)
ax.pcolormesh(min_xgrid, min_ygrid, min_jout)
# Add colorbar to each plot
plt.colorbar()
# Add labels to edge only
if pos >= n**2 - n:
plt.xlabel(xname)
if pos % n == 1:
plt.ylabel(yname)
plt.tight_layout()
plt.show()
Which produces what you want.
Corner plot
Note, I did not multiply brute_Jout by -1 so you may need to us np.amax instead if you are using your value.
Related
I have the following data set where I have to estimate the joint density of 'bwt' and 'age' using kernel density estimation with a 2-dimensional Gaussian kernel and width h=5. I can't use modules such as scipy where there are ready functions to do this and I have to built functions to calculate the density. Here's what I've gotten so far.
import numpy as np
import pandas as pd
babies_full = pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
#2d Gaussian kernel
def k_2dgauss(x):
return np.exp(-np.sum(x**2, 1)/2) / np.sqrt(2*np.pi)
#Multivariate kernel density
def mv_kernel_density(t, x, h):
d = x.shape[1]
return np.mean(k_2dgauss((t - x)/h))/h**d
t = np.linspace(1.0, 5.0, 50)
h=5
print(mv_kernel_density(t, x, h))
However, I get a value error 'ValueError: operands could not be broadcast together with shapes (50,) (1173,2)' which think is because different shape of the matrices. I also don't understand why k_2dgauss(x) for me returns an array of zeros since it should only return one value. In general, I am new to the concept of kernel density estimation I don't really know if I've written the functions right so any hints would help!
Following on from my comments on your original post, I think this is what you want to do, but if not then come back to me and we can try again.
# info supplied by OP
import numpy as np
import pandas as pdbabies_full = \
pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
# my contributions
from math import floor, ceil
def binMaker(arr, base):
"""function I already use for this sort of thing.
arr is the arr I want to make bins for
base is the bin separation, but does require you to import floor and ceil
otherwise you can make these bins manually yourself"""
binMin = floor(arr.min() / base) * base
binMax = ceil(arr.max() / base) * base
return np.arange(binMin, binMax + base, base)
bins1 = binMaker(x[:,0], 20.) # bins from 140. to 360. spaced 20 apart
bins2 = binMaker(x[:,1], 5.) # bins from 15. to 45. spaced 5. apart
counts = np.zeros((len(bins1)-1, len(bins2)-1)) # empty array for counts to go in
for i in range(0, len(bins1)-1): # loop over the intervals, hence the -1
boo = (x[:,0] >= bins1[i]) * (x[:,0] < bins1[i+1])
for j in range(0, len(bins2)-1): # loop over the intervals, hence the -1
counts[i,j] = np.count_nonzero((x[boo,1] >= bins2[j]) *
(x[boo,1] < bins2[j+1]))
# if you want your PDF to be a fraction of the total
# rather than the number of counts, do the next line
counts /= x.shape[0]
# plotting
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# setting the levels so that each number in counts has its own colour
levels = np.linspace(-0.5, counts.max()+0.5, int(counts.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, counts, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('Manually making a 2D (joint) PDF')
If this is what you wanted, then there is an easier way with np.histgoram2d, although I think you specified it had to be using your own methods, and not built in functions. I've included it anyway for completeness' sake.
pdf = np.histogram2d(x[:,0], x[:,1], bins=(bins1,bins2))[0]
pdf /= x.shape[0] # again for normalising and making a percentage
levels = np.linspace(-0.5, pdf.max()+0.5, int(pdf.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, pdf, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('using np.histogram2d to make a 2D (joint) PDF')
Final note - in this example, the only place where counts doesn't equal pdf is for the bin between 40 <= age < 45 and 280 <= gestation 300, which I think is due to how, in my manual case, I've used <= and <, and I'm a little unsure how np.histogram2d handles values outside the bin ranges, or on the bin edges etc. We can see the element of x that is responsible
>>> print(x[1011])
[280 45]
I have created a tornado plot taking inspiration from here. It has input variables labelled on the y-axis (a1,b1,c1...) and their respective correlation coefficients plotted next to them. See pic below:
I then sorted the correlation coefficients in a way that the highest absolute value without loosing its sign gets plotted first, then the next highest and so on. using sorted(values,key=abs, reverse=True). See the result below
If you notice, in the second pic even though the bars were sorted in the absolute descending order, the y-axis label still stay the same.
Question: How do I make the y-axis label(variable) connect to the correlation coefficient such that it always corresponds to its correlation coefficient.
Below is my code:
import numpy as np
from matplotlib import pyplot as plt
#####Importing Data from csv file#####
dataset1 = np.genfromtxt('dataSet1.csv', dtype = float, delimiter = ',', skip_header = 1, names = ['a', 'b', 'c', 'x0'])
dataset2 = np.genfromtxt('dataSet2.csv', dtype = float, delimiter = ',', skip_header = 1, names = ['a', 'b', 'c', 'x0'])
dataset3 = np.genfromtxt('dataSet3.csv', dtype = float, delimiter = ',', skip_header = 1, names = ['a', 'b', 'c', 'x0'])
corr1 = np.corrcoef(dataset1['a'],dataset1['x0'])
corr2 = np.corrcoef(dataset1['b'],dataset1['x0'])
corr3 = np.corrcoef(dataset1['c'],dataset1['x0'])
corr4 = np.corrcoef(dataset2['a'],dataset2['x0'])
corr5 = np.corrcoef(dataset2['b'],dataset2['x0'])
corr6 = np.corrcoef(dataset2['c'],dataset2['x0'])
corr7 = np.corrcoef(dataset3['a'],dataset3['x0'])
corr8 = np.corrcoef(dataset3['b'],dataset3['x0'])
corr9 = np.corrcoef(dataset3['c'],dataset3['x0'])
np.set_printoptions(precision=4)
variables = ['a1','b1','c1','a2','b2','c2','a3','b3','c3']
base = 0
values = np.array([corr1[0,1],corr2[0,1],corr3[0,1],
corr4[0,1],corr5[0,1],corr6[0,1],
corr7[0,1],corr8[0,1],corr9[0,1]])
values = sorted(values,key=abs, reverse=True)
# The y position for each variable
ys = range(len(values))[::-1] # top to bottom
# Plot the bars, one by one
for y, value in zip(ys, values):
high_width = base + value
#print high_width
# Each bar is a "broken" horizontal bar chart
plt.broken_barh(
[(base, high_width)],
(y - 0.4, 0.8),
facecolors=['red', 'red'], # Try different colors if you like
edgecolors=['black', 'black'],
linewidth=1)
# Draw a vertical line down the middle
plt.axvline(base, color='black')
# Position the x-axis on the top/bottom, hide all the other spines (=axis lines)
axes = plt.gca() # (gca = get current axes)
axes.spines['left'].set_visible(False)
axes.spines['right'].set_visible(False)
axes.spines['top'].set_visible(False)
axes.xaxis.set_ticks_position('bottom')
# Make the y-axis display the variables
plt.yticks(ys, variables)
plt.ylim(-2, len(variables))
plt.show()
Many thanks in advance
use build-in zip function - returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. But aware the returned list is truncated in length to the length of the shortest argument sequence.
I am a little confused by the documentation for scipy.interpolate.RegularGridInterpolator.
Say for instance I have a function f: R^3 => R which is sampled on the vertices of the unit cube. I would like to interpolate so as to find values inside the cube.
import numpy as np
# Grid points / sample locations
X = np.array([[0,0,0], [0,0,1], [0,1,0], [0,1,1], [1,0,0], [1,0,1], [1,1,0], [1,1,1.]])
# Function values at the grid points
F = np.random.rand(8)
Now, RegularGridInterpolator takes a points argument, and a values argument.
points : tuple of ndarray of float, with shapes (m1, ), ..., (mn, )
The points defining the regular grid in n dimensions.
values : array_like, shape (m1, ..., mn, ...)
The data on the regular grid in n dimensions.
I interpret this as being able to call as such:
import scipy.interpolate as irp
rgi = irp.RegularGridInterpolator(X, F)
However, when I do so, I get the following error:
ValueError: There are 8 point arrays, but values has 1 dimensions
What am I misinterpreting in the docs?
Ok I feel silly when I answer my own question, but I found my mistake with help from the documentation of the original regulargrid lib:
https://github.com/JohannesBuchner/regulargrid
points should be a list of arrays that specifies how the points are spaced along each axis.
For example, to take the unit cube as above, I should set:
pts = ( np.array([0,1.]), )*3
or if I had data which was sampled at higher resolution along the last axis, I might set:
pts = ( np.array([0,1.]), np.array([0,1.]), np.array([0,0.5,1.]) )
Finally, values has to be of shape corresponding to the grid laid out implicitly by points. For example,
val_size = map(lambda q: q.shape[0], pts)
vals = np.zeros( val_size )
# make an arbitrary function to test:
func = lambda pt: (pt**2).sum()
# collect func's values at grid pts
for i in range(pts[0].shape[0]):
for j in range(pts[1].shape[0]):
for k in range(pts[2].shape[0]):
vals[i,j,k] = func(np.array([pts[0][i], pts[1][j], pts[2][k]]))
So finally,
rgi = irp.RegularGridInterpolator(points=pts, values=vals)
runs and performs as desired.
Your answer is nicer, and it's perfectly OK for you to accept it. I'm just adding this as an "alternate" way to script it.
import numpy as np
import scipy.interpolate as spint
RGI = spint.RegularGridInterpolator
x = np.linspace(0, 1, 3) # or 0.5*np.arange(3.) works too
# populate the 3D array of values (re-using x because lazy)
X, Y, Z = np.meshgrid(x, x, x, indexing='ij')
vals = np.sin(X) + np.cos(Y) + np.tan(Z)
# make the interpolator, (list of 1D axes, values at all points)
rgi = RGI(points=[x, x, x], values=vals) # can also be [x]*3 or (x,)*3
tst = (0.47, 0.49, 0.53)
print rgi(tst)
print np.sin(tst[0]) + np.cos(tst[1]) + np.tan(tst[2])
returns:
1.93765972087
1.92113615659
I have some data of a particle moving in a corridor with closed boundary conditions.
Plotting the trajectory leads to a zig-zag trajectory.
I would like to know how to prevent plot() from connecting the points where the particle comes back to the start. Some thing like in the upper part of the pic, but without "."
The first idea I had was to find the index where the numpy array a[:-1]-a[1:] becomes positive and then plot from 0 to that index. But how would I get the index of the first occurrence of a positive element of a[:-1]-a[1:]?
Maybe there are some other ideas.
I'd go a different approach. First, I'd determine the jump points not by looking at the sign of the derivative, as probably the movement might go up or down, or even have some periodicity in it. I'd look at those points with the biggest derivative.
Second, an elegant approach to have breaks in a plot line is to mask one value on each jump. Then matplotlib will make segments automatically. My code is:
import pylab as plt
import numpy as np
xs = np.linspace(0., 100., 1000.)
data = (xs*0.03 + np.sin(xs) * 0.1) % 1
plt.subplot(2,1,1)
plt.plot(xs, data, "r-")
#Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(data))
mask = np.hstack([ abs_d_data > abs_d_data.mean()+3*abs_d_data.std(), [False]])
masked_data = np.ma.MaskedArray(data, mask)
plt.subplot(2,1,2)
plt.plot(xs, masked_data, "b-")
plt.show()
And gives us as result:
The disadvantage of course is that you lose one point at each break - but with the sampling rate you seem to have I guess you can trade this in for simpler code.
To find where the particle has crossed the upper boundary, you can do something like this:
>>> import numpy as np
>>> a = np.linspace(0, 10, 50) % 5
>>> a = np.linspace(0, 10, 50) % 5 # some sample data
>>> np.nonzero(np.diff(a) < 0)[0] + 1
array([25, 49])
>>> a[24:27]
array([ 4.89795918, 0.10204082, 0.30612245])
>>> a[48:]
array([ 4.79591837, 0. ])
>>>
np.diff(a) calculates the discrete difference of a, while np.nonzero finds where the condition np.diff(a) < 0 is negative, i.e., the particle has moved downward.
To avoid the connecting line you will have to plot by segments.
Here's a quick way to plot by segments when the derivative of a changes sign:
import numpy as np
a = np.linspace(0, 20, 50) % 5 # similar to Micheal's sample data
x = np.arange(50) # x scale
indices = np.where(np.diff(a) < 0)[0] + 1 # the same as Micheal's np.nonzero
for n, i in enumerate(indices):
if n == 0:
plot(x[:i], a[:i], 'b-')
else:
plot(x[indices[n - 1]:i], a[indices[n - 1]:i], 'b-')
Based on Thorsten Kranz answer a version which adds points to the original data when the 'y' crosses the period. This is important if the density of data-points isn't very high, e.g. np.linspace(0., 100., 100) vs. the original np.linspace(0., 100., 1000). The x position of the curve transitions are linear interpolated. Wrapped up in a function its:
import numpy as np
def periodic2plot(x, y, period=np.pi*2.):
indexes = np.argwhere(np.abs(np.diff(y))>.5*period).flatten()
index_shift = 0
for i in indexes:
i += index_shift
index_shift += 3 # in every loop it adds 3 elements
if y[i] > .5*period:
x_transit = np.interp(period, np.unwrap(y[i:i+2], period=period), x[i:i+2])
add = np.ma.array([ period, 0., 0.], mask=[0,1,0])
else:
# interpolate needs sorted xp = np.unwrap(y[i:i+2], period=period)
x_transit = np.interp(0, np.unwrap(y[i:i+2], period=period)[::-1], x[i:i+2][::-1])
add = np.ma.array([ 0., 0., period], mask=[0,1,0])
x_add = np.ma.array([x_transit]*3, mask=[0,1,0])
x = np.ma.hstack((x[:i+1], x_add, x[i+1:]))
y = np.ma.hstack((y[:i+1], add, y[i+1:]))
return x, y
The code for comparison to the original answer of Thorsten Kranz with lower data-points density.
import matplotlib.pyplot as plt
x = np.linspace(0., 100., 100)
y = (x*0.03 + np.sin(x) * 0.1) % 1
#Thorsten Kranz: Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(y))
mask = np.hstack([np.abs(np.diff(y))>.5, [False]])
masked_y = np.ma.MaskedArray(y, mask)
# Plot
plt.figure()
plt.plot(*periodic2plot(x, y, period=1), label='This answer')
plt.plot(x, masked_y, label='Thorsten Kranz')
plt.autoscale(enable=True, axis='both', tight=True)
plt.legend(loc=1)
plt.tight_layout()
I have to plot several "curves", each one composed by horizontal segments (or even points), using matplotlib library.
I reached this goal separing the segments by NaNs.
This is my example (working) code:
from pylab import arange, randint, hold, plot, show, nan, ylim, legend
n = 6
L = 25
hold(True)
for i in range(n):
x = arange(L, dtype=float) # generates a 1xL array of floats
m = randint(1, L)
x[randint(1, L, m)] = nan # set m values as NaN
y = [n - i] * len(x) # constant y value
plot(x, y, '.-')
leg = ['data_{}'.format(j+1) for j in range(n)]
legend(leg)
ylim(0, i + 2)
show()
(actually, I start from lists of integers: NaNs are added after where integers are missing)
Problem: since each line requires an array of length L, this solution can be expensive in terms of memory if L is big, while the necessary and sufficient information are the limits of segments.
For example, for one line composed by 2 segments of limits (0, 500) and (915, 62000) it would be nice to do something like this:
niceplot([(0, 500), (915, 62000)], [(1, 1), (1, 1)])
(note: this - with plot instead niceplot... - is a working code but it makes other things...)
4*2 values instead of 62000*2...
Any suggestions?
(this is my first question, be clement^^)
Is this something like what you wish to achieve?
import matplotlib.pyplot as plt
segments = {1: [(0, 500),
(915, 1000)],
2: [(0, 250),
(500, 1000)]}
colors = {1: 'b', 2: 'r'}
for y in segments:
col = colors.get(y, 'k')
for seg in segments[y]:
plt.plot(seg, [y, y], color=col)
I'm just defining the y values as keys and a list of line segments (xlo, xhi) to be plotted at each y value.