Plot sparsely populated 2d numpy array - python

from an iterative image pattern search with decreasing step size I have a 'quality' array. Due to the nature of the search pattern the array is not fully filled. In the first iteration I go with stepsize 10, find the best spot and there search a +-10 XY range to find the true best spot. So most of the array has every 10th slot filled and there is the small 'best' region that is densely filled. Now I want to plot this array and would want the plot to be 'interpolated' where needed by using the data every 10th slot. Now to do my search I initialize the array with a huge value. All my measurements are smaller and later I use the np.argmin(q) function. That works fine for searching but for plotting it is bad. The dynamic range of the plot is lost.
Here is an example from an older version of the code that does exhaustive but unnecessarily long search :
And here is what I get with the optimized search :
Here is the piece of code that does the plots. (q is the quality array to plot)
fig= plt.figure(1)
im= plt.imshow(q[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
fig.savefig(pfn(img_fn), bbox_inches='tight')
The issue may point back to the initialization of the array. Again as I do a minimum search I do this :
q = np.empty(shape=(2*search_size,2*search_size))
q.fill(+1e20)
q_min = 1e20
for xs in range(-search_size,+search_size,search_step):
for ys in range(-search_size,+search_size,search_step):
img_shift = np.zeros_like(img)
img_shift[mom(ys):non(ys), mom(xs):non(xs)] = img[mom(-ys):non(-ys), mom(-xs):non(-xs)]
d = np.absolute(img_shift - prev_img)[search_size:-search_size,search_size:-search_size]
q[ys+search_size,xs+search_size] = np.sum(d)
if q[ys+search_size,xs+search_size] < q_min : q_min= q[ys+search_size,xs+search_size]
#print '1st iter try : %+3d %+3d %6.3f %6.3f' % ( xs, ys, q[ys+search_size,xs+search_size], q_min)
idxmin = np.argmin(q)
dy,dx = np.unravel_index(idxmin, q.shape)
dx= dx-search_size
dy= dy-search_size
print '1st iter best : dx= %+3d dy= %+3d' % ( dx , dy )
Then follows another loop with search_step = 1.
Is it possible to initialize the array i.e. with NaN ? Would that allow the minimum search? And/or would it allow the plotter to jump accross undefined entries?
So what's the best way to initialize / plot so that the search works and the plots look good?
Thanks,
Gert
Update #Nix G-D
The averaging fails. I first tried code following the recommendation.
q_int = pd.DataFrame(q).interpolate(method='linear', axis=0).values
fig= plt.figure(1)
im= plt.imshow(q_int[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
However the 2D interpolation failed. (at least as indicated by the plot)
I tried to add code to perform X and Y interpolation.
q_int = pd.DataFrame(q).interpolate(method='linear', axis=0).values
q_int = pd.DataFrame(q_intx).interpolate(method='linear', axis=1).values
fig= plt.figure(1)
im= plt.imshow(q_int[::-1], cmap='rainbow', interpolation='none', extent=[-search_size,search_size,-search_size,search_size])
But results still were corrupted.
Best,
Gert

You can initialize the array with NaN easily:
shape = (2*search_size, 2*search_size)
q = np.full(shape, np.nan)
This can then be searched as normal. To find the minimum indices ignoring NaNs, you can use np.nanargmin()
In [12]: np.nanargmin([1,-1,4,float('nan')])
Out[12]: 1
To get rid of these NaN values we can use, pandas.DataFrame.interpolate():
q_interpolated = pd.DataFrame(q).interpolate(method='linear', axis=0).values

Related

How can I use the scipy.interpolate.interp1d in python to plot 2 y curves instead of 1?

I'm pretty new on python. I have a piece of code that reads some data from a file, creates several arrays and plots them with plt.plot. The arrays are s for the x-axis, and P_abs_i and P_abs_e in the y-axis. The code was working fine until I tried to plot smooth lines instead of the dafault ones.
I tried to use the interpolate.interpid function to plot smooth lines. I used np.arrays to turn my arrays into numpy arrays, following the example in the interpid guide. I then used interpid to create a cubic interpolation curve and np.linspace to get evenly spaced samples. It worked for one of the lines (P_abs_e), so I then tried to copy the same process for the other line (P_abs_i) but I got the error message: "ValueError: x and y must have same first dimension, but have shapes (500,) and (1, 500)". Can somebody help? (The code is below, not sure if it's going to show properly since this is my first time posting):
x_e = np.array(s)
y_e = np.array(P_abs_e)
cubic_interpolation_model_e = interp1d(x_e, y_e, kind = "cubic")
X_e=np.linspace(x_e.min(), x_e.max(), 500)
Y_e=cubic_interpolation_model_e(X_e)
plt.plot(X_e, Y_e, 'b', label = 'e')
x_i = np.array(s)
y_i = np.array(P_abs_i)
cubic_interpolation_model_i = interp1d(x_i, y_i, kind = "cubic")
X_i=np.linspace(x_i.min(), x_i.max(), 500)
Y_i=cubic_interpolation_model_i(X_i)
plt.plot(X_i, Y_i, 'g', label = 'He3')

When converting polar to cartesian, a slice of the pie is missing

I have done measurements on external software where I do my measurements in the cylindrical coordinates R, phi and z. However, I select one z, to make a contourplot over so I have coordinates in R and phi. To turn that into x and y, I make a 2D array of x and y with x being equal to R * cos(phi) and y to R * sin(phi). Like this:
t_xray = np.zeros((Rbins, Phibins))
t_yray = np.zeros((Rbins, Phibins))
for i in range(0, Rbins):
for j in range(0, Phibins):
t_xray[i,j] = Rray[i] * np.cos(Phiray[j])
t_yray[i,j] = Rray[i] * np.sin(Phiray[j])
with Rbins and Phibins being equal to the length of the arrays of R's and phi's. Seems like a legitimate way to get it done, right? Apparently not, as this is what my plot looks like:
Plot with slice of the pie missing. Made possible with:
plt.contourf(t_xray, t_yray, Doos_TG43, 1000, locator = ticker.LogLocator())
cbar = plt.colorbar(label = r'$\it{D}$ (cGy$\cdot$ h$^{-1}$)')
My first thought was that there was somehow a bigger leap in-between two angles that Python couldn't interpolate in-between, but when printing the array of phi's, you can see the leap between the first and last angle in the array is the same as in-between any element of the array (assuming we count k2pi + phi as phi):
[0.03141593 0.09424778 0.15707963 0.21991149 0.28274334 0.34557519
0.40840704 0.4712389 0.53407075 0.5969026 0.65973446 0.72256631
0.78539816 0.84823002 0.91106187 0.97389372 1.03672558 1.09955743
1.16238928 1.22522113 1.28805299 1.35088484 1.41371669 1.47654855
1.5393804 1.60221225 1.66504411 1.72787596 1.79070781 1.85353967
1.91637152 1.97920337 2.04203522 2.10486708 2.16769893 2.23053078
2.29336264 2.35619449 2.41902634 2.4818582 2.54469005 2.6075219
2.67035376 2.73318561 2.79601746 2.85884931 2.92168117 2.98451302
3.04734487 3.11017673 3.17300858 3.23584043 3.29867229 3.36150414
3.42433599 3.48716785 3.5499997 3.61283155 3.6756634 3.73849526
3.80132711 3.86415896 3.92699082 3.98982267 4.05265452 4.11548638
4.17831823 4.24115008 4.30398194 4.36681379 4.42964564 4.49247749
4.55530935 4.6181412 4.68097305 4.74380491 4.80663676 4.86946861
4.93230047 4.99513232 5.05796417 5.12079603 5.18362788 5.24645973
5.30929158 5.37212344 5.43495529 5.49778714 5.560619 5.62345085
5.6862827 5.74911456 5.81194641 5.87477826 5.93761012 6.00044197
6.06327382 6.12610567 6.18893753 6.25176938]
So it seems I am completely out of the loop here. Why is it as if a slice is cut from the 'pie', despite everything I just mentioned?
To summarise, I tried to see whether the problem is something with the angles, but it turns out even that doesn't help giving back the slice. I have no idea what cause a piece to go missing suddenly.

Pyplot - show x-axis labels according to y-axis value

I have 1min 20s long video record of 23.813 FPS. More precisely, I have 1923 frames in which I've been scanning desired features. I've detected some specific behavior via neural network and using chosen metric I calculated a value for each frame.
So, now, I have X-Y values to plot a graph:
X: time (each step of size 0,041993869s)
Y: a value measured by neural network
In the default state, the plot looks like this:
So, I've tried to limit the number of bins in the faith that the bins will be spread over all my values. But they are not. As you can see, only first fifteen x-values are rendered:
pyplot.locator_params(axis='x', nbins=15)
But neither one is desired state. The desired state should render the labels of such x-bins with y-value higher than e.g. 1.2. So, it should look like this:
Is possible to achieve such result?
Code:
# draw plot
from pandas import read_csv
from matplotlib import pyplot
test_video_fps = 23.813
df = read_csv('/path/to/csv/file/file.csv', header=None)
df.columns = ['anomaly']
df['time'] = [round((i + 1) / test_video_fps, 2) for i in range(df.shape[0])]
axes = df.plot.bar(x='time', y='anomaly', rot='0')
# pyplot.locator_params(axis='x', nbins=15)
# axes.get_xaxis().set_visible(False)
fig = pyplot.gcf()
fig.set_size_inches(16, 10)
fig.savefig('/path/to/output/plot.png', dpi=100)
# pyplot.show()
Example:
Simple example with a subset of original data.
0.379799
0.383786
0.345488
0.433286
0.469474
0.431993
0.474253
0.418843
0.491070
0.447778
0.384890
0.410994
0.898229
1.872756
2.907009
3.691382
4.685749
4.599612
3.738768
8.043357
7.660785
2.311198
1.956096
2.877326
3.467511
3.896339
4.250552
6.485533
7.452986
7.103761
2.684189
2.516134
1.512196
1.435303
0.852047
0.842551
0.957888
0.983085
0.990608
1.046679
1.082040
1.119655
0.962391
1.263255
1.371034
1.652812
2.160451
2.646674
1.460051
1.163745
0.938030
0.862976
0.734119
0.567076
0.417270
Desired plot:
Your question has become a two-part problem, but it is interesting enough that I will answer both.
I will answer this in Matplotlib object oriented notation with numpy data rather than pandas. This will make things easier to explain, and can be easily generalized to pandas.
I will assume that you have the following two data arrays:
dt = 0.041993869
x = np.arange(0.0, 15 * dt, dt)
y = np.array([1., 1.1, 1.3, 7.6, 2.4, 0.8, 0.7, 0.8, 1.0, 1.5, 10.0, 4.5, 3.2, 0.9, 0.7])
Part 1: Identifying the locations where you want labels
The data can be masked to get the locations of the peaks:
mask = y > 1.2
Consecutive peaks can be easily eliminated by computing the diff. A diff of a boolean mask will be True at the locations where the mask changes sense. You will then have to take every other element to get the locations where it goes from False to True. The following code will capture all the corner cases where you start with a peak or end in the middle of a peak:
d = np.flatnonzero(np.diff(mask))
if mask[d[0]]: # First diff is end of peak: True to False
d = np.concatenate(([0], d[1::2] + 1))
else:
d = d[::2] + 1
d is now an array indices into x and y that represent the first element of each run of peaks. You can get the last element by swapping the indices [1::2] and [::2] in the if-else statement, and removing the + 1 in both cases.
The locations of the labels are now simply x[d].
Part 2: Locating and formatting the labels
For this part, you will need to access Matplotlib's object oriented API via the Axes object you are plotting on. You already have this in the pandas form, making the transfer easy. Here is a sample in raw Matplotlib:
fig, axes = plt.subplots()
axes.plot(x, y)
Now use the ticker API to easily set the locations and labels. You actually set the locations directly (not with a Locator) since you have a very fixed list of ticks:
axes.set_xticks(x[d])
axes.xaxis.set_major_formatter(ticker.StrMethodFormatter('{x:0.01g}s'))
For the sample data show here, you get

How can I set the colorbar limits for a yt.SlicePlot using set_cmap?

I am totally new to Python and I am totally lost.
My supervisor helped me to generate a script to see some slices of a 3D velocity model:
import numpy as np
import matplotlib.pyplot as plt
import yt
from yt.units import km
#Import et reshape data
d = np.genfromtxt('velocity_model.txt', delimiter=' ')
nd=22
nx=131
vel = d[:,3].reshape(nd,nx,nx)
lat = d[:,0].reshape(nd,nx,nx)
lon = d[:,1].reshape(nd,nx,nx)
dep = d[:,2].reshape(nd,nx,nx)
# When this is read into YT, depth increases along x axis, longitude increases along y axis and latitude increases along z axis, need to swap x and z and then flip z
dep=dep.swapaxes(0,2) # swap first and third dimensions: gives lon (x), lat (y), depth (z)
vel=vel.swapaxes(0,2) # swap first and third dimensions:
lat=lat.swapaxes(0,2) # swap first and third dimensions:
lon=lon.swapaxes(0,2) # swap first and third dimensions:
dep=dep[:,:,::-1] # reverse z direction
vel=vel[:,:,::-1] # swap first and 2nd dimensions:
lat=lat[:,:,::-1] # swap first and 2nd dimensions:
lon=lon[:,:,::-1] # swap first and 2nd dimensions:
xmin=0
xmax=289
ymin=0
ymax=289
zmin=-100
zmax=5
#Entrer dans YT
data=dict(velocity=(vel,'km/s'),latitude=(lat,'deg'),longitude=(lon,'deg'),depth=(dep,'km'))
bbox = np.array([[xmin,xmax], [ymin,ymax], [zmin,zmax]])
ds=yt.load_uniform_grid(data,vel.shape, length_unit='km', bbox=bbox)
#Off-Axis Slice
for key in ['latitude','longitude','depth','velocity'] :
L = [0,0,1] # cutting plane=z
slicepos=-50
c = [(xmax-xmin)/2, (ymax-ymin)/2, slicepos]
cut = yt.SlicePlot(ds, L, key,origin='native',center=c) #, width=(200,90,'km'))
cut.set_log(key, False)
cut.annotate_text([0.5,0.9],'z={:d} km'.format(slicepos),coord_system='axis')
cut.set_cmap(field='velocity',cmap='jet_r')
cut.save()
With this script, I would like to fix the colorbar, because for each image this one change, and it's not easy to interpret like this.
I tried to add limits like this:
h=colorbar
h.Limits = [5 9]
cut.set_cmap(field='velocity',cmap='jet_r', h)
But it's not the good way. Does someone have an idea? I saw lot of things but not for cmap.
You're looking for the set_zlim function:
http://yt-project.org/doc/reference/api/generated/yt.visualization.plot_window.AxisAlignedSlicePlot.set_zlim.html
The set_cmap function just allows you to choose which colormap you want, it does not allow you to set the colormap range. You need to use set_zlim for that. Here's an example, using one of the sample datasets from http://yt-project.org/data:
import yt
ds = yt.load('IsolatedGalaxy/galaxy0030/galaxy0030')
plot = yt.SlicePlot(ds, 2, 'density')
plot.set_cmap('density', 'viridis')
plot.set_zlim('density', 1e-28, 1e-25)
This produces the following image:
This is really a question about the yt visualization library rather than matplotlib per se - I've edited the title and tags to reflect this.
I have never come across yt before, but based on the official documentation for yt.SlicePlot, it seems that cut will either be an AxisAlignedSlicePlot or an OffAxisSlicePlot object. Both of these classes have a .set_zlim() method that seems to do what you want:
AxisAlignedSlicePlot.set_zlim(*args, **kwargs)
set the scale of the
colormap
Parameters:
field : string
the field to set a colormap scale if field == ‘all’, applies to all
plots.
zmin : float
the new minimum of the colormap scale. If ‘min’,
will set to the minimum value in the current view.
zmax : float
the new maximum of the colormap scale. If ‘max’, will set to the maximum
value in the current view.
Other Parameters:
dynamic_range : float (default: None)
The dynamic range of the image. If zmin == None, will set zmin = zmax / dynamic_range If zmax == None, will set zmax = zmin * dynamic_range. When dynamic_range is specified, defaults to setting zmin = zmax / dynamic_range.
In other words, you could probably use:
cut.set_zlim(field='velocity', zmin=5, zmax=9)

plotting high precision data

I have an array which contains error values as a function of two different quantities (alpha and eigRange).
I fill my array like this :
for j in range(n):
for i in range(alphaLen):
alpha = alpha_list[i]
c = train.eig(xt_, yt_,m-j, m,alpha, "cpu")
costListTrain[j, i] = cost.err(xt_, xt_, yt_, c)
normedValues=costListTrain/np.max(costListTrain.ravel())
where
n = 20
alpha_list = [0.0001,0.0003,0.0008,0.001,0.003,0.006,0.01,0.03,0.05]
My costListTrain array contains some values that have very small differences, e.g.:
2.809458902485728 2.809458905776425 2.809458913576337 2.809459011062461
2.030326752376704 2.030329906064879 2.030337351188699 2.030428976282031
1.919840839066182 1.919846470077076 1.919859731440199 1.920021453630778
1.858436351617677 1.858444223016128 1.858462730482461 1.858687054377165
1.475871326997542 1.475901926855846 1.475973476249240 1.476822830933632
1.475775410801635 1.475806023102173 1.475877601316863 1.476727286424228
1.475774284270633 1.475804896751524 1.475876475382906 1.476726165223209
1.463578292548192 1.463611627166494 1.463689466240788 1.464609083309240
1.462859608038034 1.462893157900139 1.462971489632478 1.463896516033939
1.461912706143012 1.461954067956570 1.462047793798572 1.463079574605320
1.450581041157659 1.452770209885761 1.454835202839513 1.459676311335618
1.450581041157643 1.452770209885764 1.454835202839484 1.459676311335624
1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610
1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620
1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619
1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613
1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625
1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617
1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622
1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622
as you can here the value are very very close together!
I am trying to plotting this data in a way where I have the two quantities in the x, y axes and the error value is represented by the dot color.
This is how I'm plotting my data:
alpha_list = np.log(alpha_list)
eigenvalues, alphaa = np.meshgrid(eigRange, alpha_list)
vMin = np.min(costListTrain)
vMax = np.max(costListTrain)
plt.scatter(x, y, s=70, c=normedValues, vmin=vMin, vmax=vMax, alpha=0.50)
but the result is not correct.
I tried to normalize my error value by dividing all values by the max, but it didn't work !
The only way that I could make it work (which is incorrect) is to normalize my data in two different ways. One is base on each column (which means factor1 is constant, factor 2 changing), and the other one based on row (means factor 2 is constant and factor one changing). But it doesn't really make sense because I need a single plot to show the tradeoff between the two quantities on the error values.
UPDATE
this is what I mean by last paragraph.
normalizing values base on max on each rows which correspond to eigenvalues:
maxsEigBasedTrain= np.amax(costListTrain.T,1)[:,np.newaxis]
maxsEigBasedTest= np.amax(costListTest.T,1)[:,np.newaxis]
normEigCostTrain=costListTrain.T/maxsEigBasedTrain
normEigCostTest=costListTest.T/maxsEigBasedTest
normalizing values base on max on each column which correspond to alphas:
maxsAlphaBasedTrain= np.amax(costListTrain,1)[:,np.newaxis]
maxsAlphaBasedTest= np.amax(costListTest,1)[:,np.newaxis]
normAlphaCostTrain=costListTrain/maxsAlphaBasedTrain
normAlphaCostTest=costListTest/maxsAlphaBasedTest
plot 1:
where no. eigenvalue = 10 and alpha changes (should correspond to column 10 of plot 1) :
where alpha = 0.0001 and eigenvalues change (should correspond to first row of plot1)
but as you can see the results are different from plot 1!
UPDATE:
just to clarify more stuff this is how I read my data:
from sklearn.datasets.samples_generator import make_regression
rng = np.random.RandomState(0)
diabetes = datasets.load_diabetes()
X_diabetes, y_diabetes = diabetes.data, diabetes.target
X_diabetes=np.c_[np.ones(len(X_diabetes)),X_diabetes]
ind = np.arange(X_diabetes.shape[0])
rng.shuffle(ind)
#===============================================================================
# Split Data
#===============================================================================
import math
cross= math.ceil(0.7*len(X_diabetes))
ind_train = ind[:cross]
X_train, y_train = X_diabetes[ind_train], y_diabetes[ind_train]
ind_val=ind[cross:]
X_val,y_val= X_diabetes[ind_val], y_diabetes[ind_val]
I also uploaded .csv files HERE
log.csv contain the original value before normalization for plot 1
normalizedLog.csv for plot 1
eigenConst.csv for plot 2
alphaConst.csv for plot 3
I think I found the answer. First of all there was one problem in my code. I was expecting the "No. of eigenvalue" correspond to rows but in my for loop they fill the columns. The currect answer is this :
for i in range(alphaLen):
for j in range(n):
alpha=alpha_list[i]
c=train.eig(xt_, yt_,m-j,m,alpha,"cpu")
costListTrain[i,j]=cost.err(xt_,xt_,yt_,c)
costListTest[i,j]=cost.err(xt_,xv_,yv_,c)
After asking questions from friends and colleagues I got this answer :
I would assume on default imshow and other plotting commands you
might want to use, do equally sized intervals on the values you are
plotting. if you can set that to logarithmic you should be fine.
Ideally, equally "populated bins" would proof most effective, i guess.
for plotting I just subtract the min value from the error and the add a small number and at the end take the log.
temp=costListTrain- costListTrain.min()
temp+=0.00000001
extent = [0, 20,alpha_list[0], alpha_list[-1]]
plt.imshow(np.log(temp),interpolation="nearest",cmap=plt.get_cmap('spectral'), extent = extent, origin="lower")
plt.colorbar()
and result is :

Categories

Resources