Matplotlib Pyplot not plotting properly in a for loop - python

tl;dr trying to create plots using for loop in Python 3 but it is resulting in weird inconsistent plots.
I am trying to create plots like Figure 2 of Fox et al. 2015 ( I have the different ion lines saved in a dictionary with their parameters saved as lists. For example,
DictIon = {
'''The way this dictionary is defined is as follows: ion name:
[wavelength, f-value, continuum lower 1, continuum higher 1, continuum lower 2,
continuum higher 2, subplot x, subplot y, ion name x, ion name y]'''
'C II 1334': [1334.5323,.1278,-400,-200,300,500,0,0,-380,.2],
'Si II 1260': [1260.4221,1.007,-500,-370,300,500,2,0,-380,.15],
'Si II 1193': [1193.2897,0.4991,-500,-200,200,500,3,0,-380,.2]
In order to generate the plot, I have written the following code where at first I calculate the continuum of the absorption lines and then use data from a Voigt Profile algorithm that gives me the zval and bval that I use in the Voigt profile section of the code, and finally I combine these two and plot the values in appropriate subplots. The code is:
f, axes = plt.subplots(6, 2, sharex='col', sharey='row',figsize = (15,15))
for ion in DictIon:
w0 = DictIon[ion][0] #Rest wavelength
fv = DictIon[ion][1] #Oscillator strengh/f-value
velocity = (wavelength-Lambda)/Lambda*c
#Fit the continuum
low1 = DictIon[ion][2] #lower bound for continuum fit on the left side
high1 = DictIon[ion][3] #upper bound for continuum fit on the left side
low2 = DictIon[ion][4] #lower bound for continuum fit on the right side
high2 = DictIon[ion][5] #upper bound for continuum fit on the right side
x1 = velocity[(velocity>=low1) & (velocity<=high1)]
x2 = velocity[(velocity>=low2) & (velocity<=high2)]
X = np.append(x1,x2)
y1 = flux[(velocity>=low1) & (velocity<=high1)]
y2 = flux[(velocity>=low2) & (velocity<=high2)]
Y = np.append(y1,y2)
Z = np.polyfit(X,Y,1)
#Generate data to plot continuum
xp = np.linspace(-500,501,len(flux[(velocity>=-500) & (velocity<=500)]))
p = np.poly1d(Z)
#Normalize flux
norm_flux = flux[(velocity>=-500) & (velocity<=500)]/p(xp)
#Create a line at y=1
tmp1 = np.linspace(-500,500,10)
tmp2 = np.full((1,10),1)[0]
'''Generate Voigt Profile Fits'''
#Initialize arrays
vmod = (np.arange(npix+1)-(npix/pixsize))*pixsize #-npix to npix in steps of pix
fmodraw = np.ndarray((npix+1)); fmodraw.fill(1.0)
ncom = len(zval)+1
fitn = 10**(logn); fitne = 10**(logn+elogn)-10**logn
etotcol=np.sqrt(np.sum(elogn**2)) #strictly only true if independent
#Set up arrays
sigma=bval/np.sqrt(2.0); tau0=np.ndarray((ncom)); tauv=np.ndarray((ncom,npix))
find=np.ndarray((ncom, npix)); sfind=np.ndarray((ncom, npix)) #smoothed
#go from z to velocity
v0=c*(zval-zmod) / (1.0+zmod); ev0=c*zvale / (1.0+zmod)
bv=bval; ebv=bvale
#generate models for each comp, where tau is Gaussian with velocity
for k in range(0, ncom-1):
tau0[k]=(1.497e-2*(10**logn[k])*(w0/1.0e8)*fv) / (bval[k]*1.0e5)
for j in range(0, npix-1):
tauv[k][j]=tau0[k]*np.exp((-1.0*(vmod[j]-v0[k])**2) / (2.0*sigma[k]**2))
tauv = tauv.T
find = find.T
#Sum over components (pixel by pixel)
for j in range(0, npix-1):
#create Gaussian kernel (smoothing function or LSF)
#created on 1 km/s grid with default FWHM=20.0 km/s (UVES), integral=1
sigins=fwhmins/(1.414*1.665); nker=150 #NEED TO FIND NKER
vt=np.arange(nker)-nker/2 #-75 to +75 in 1 km/s steps
#convolve total and individual comps with LSF
fmod = np.convolve(fmodraw, smfn, mode='same')
axes[DictIon[ion][6]][DictIon[ion][7]].plot(vmod,fmod,'r', linewidth = 1.5)
axes[DictIon[ion][6]][DictIon[ion][7]].step(velocity[(velocity>=-500) & (velocity<=500)],norm_flux,'k')
Instead of producing plots as Figure 2 of Fox et al. 2015, I am getting situations like this where the code will produce different results at different times when I run it:
Bottom line, I have tried to debug this for 3 days now and I am at a loss. I suspect that it may have something to do with how pyplot plots work in for loop and the fact that I am using a dictionary to loop through. Any advice or suggestions would be greatly appreciated. I am using Python 3.
Data available here:
zval,bval values:
velocity, flux values:
Si III 1206:
Si IV 1393:

Here's a MWE of the operation you want to perform:
import matplotlib.pyplot as plt
import numpy as np
f, axes = plt.subplots(3, 2, sharex='col', sharey='row',figsize=(15,15))
data_dict = {'C II 1334': [1., 2.],
'Si II 1260': [2., 3.],
'Si II 1193': [3., 4.]}
for i, (key, value) in enumerate(data_dict.items()):
print i, key, value
x = np.linspace(0, 100, 10)
y0 = value[0] * x
y1 = value[1] * x
axes[i, 0].plot(x, y0, label=key)
axes[i, 1].plot(x, y1, label=key)
axes[i, 0].legend(loc="upper right")
axes[i, 1].legend(loc="upper right")
Results in
I don't see any strange behaviour after invoking plt in a for loop over a dict.
I suggest you separate data processing/calculations, i.e. calculations of the scientific quantities of interest, from plotting said data.
Note that the ordering of the dictionary items isn't preserved - better to use a list or an ordered dictionary in my example.


Estimate joint density with 2d Gaussian kernel

I have the following data set where I have to estimate the joint density of 'bwt' and 'age' using kernel density estimation with a 2-dimensional Gaussian kernel and width h=5. I can't use modules such as scipy where there are ready functions to do this and I have to built functions to calculate the density. Here's what I've gotten so far.
import numpy as np
import pandas as pd
babies_full = pd.read_csv("", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
#2d Gaussian kernel
def k_2dgauss(x):
return np.exp(-np.sum(x**2, 1)/2) / np.sqrt(2*np.pi)
#Multivariate kernel density
def mv_kernel_density(t, x, h):
d = x.shape[1]
return np.mean(k_2dgauss((t - x)/h))/h**d
t = np.linspace(1.0, 5.0, 50)
print(mv_kernel_density(t, x, h))
However, I get a value error 'ValueError: operands could not be broadcast together with shapes (50,) (1173,2)' which think is because different shape of the matrices. I also don't understand why k_2dgauss(x) for me returns an array of zeros since it should only return one value. In general, I am new to the concept of kernel density estimation I don't really know if I've written the functions right so any hints would help!
Following on from my comments on your original post, I think this is what you want to do, but if not then come back to me and we can try again.
# info supplied by OP
import numpy as np
import pandas as pdbabies_full = \
pd.read_csv("", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
# my contributions
from math import floor, ceil
def binMaker(arr, base):
"""function I already use for this sort of thing.
arr is the arr I want to make bins for
base is the bin separation, but does require you to import floor and ceil
otherwise you can make these bins manually yourself"""
binMin = floor(arr.min() / base) * base
binMax = ceil(arr.max() / base) * base
return np.arange(binMin, binMax + base, base)
bins1 = binMaker(x[:,0], 20.) # bins from 140. to 360. spaced 20 apart
bins2 = binMaker(x[:,1], 5.) # bins from 15. to 45. spaced 5. apart
counts = np.zeros((len(bins1)-1, len(bins2)-1)) # empty array for counts to go in
for i in range(0, len(bins1)-1): # loop over the intervals, hence the -1
boo = (x[:,0] >= bins1[i]) * (x[:,0] < bins1[i+1])
for j in range(0, len(bins2)-1): # loop over the intervals, hence the -1
counts[i,j] = np.count_nonzero((x[boo,1] >= bins2[j]) *
(x[boo,1] < bins2[j+1]))
# if you want your PDF to be a fraction of the total
# rather than the number of counts, do the next line
counts /= x.shape[0]
# plotting
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# setting the levels so that each number in counts has its own colour
levels = np.linspace(-0.5, counts.max()+0.5, int(counts.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, counts, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
plt.title('Manually making a 2D (joint) PDF')
If this is what you wanted, then there is an easier way with np.histgoram2d, although I think you specified it had to be using your own methods, and not built in functions. I've included it anyway for completeness' sake.
pdf = np.histogram2d(x[:,0], x[:,1], bins=(bins1,bins2))[0]
pdf /= x.shape[0] # again for normalising and making a percentage
levels = np.linspace(-0.5, pdf.max()+0.5, int(pdf.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, pdf, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
plt.title('using np.histogram2d to make a 2D (joint) PDF')
Final note - in this example, the only place where counts doesn't equal pdf is for the bin between 40 <= age < 45 and 280 <= gestation 300, which I think is due to how, in my manual case, I've used <= and <, and I'm a little unsure how np.histogram2d handles values outside the bin ranges, or on the bin edges etc. We can see the element of x that is responsible
>>> print(x[1011])
[280 45]

How to get value of area under multiple peaks

I have some data from a bioanalyzer which gives me time (x-axis) and absorbance values (y-axis). The time is every .05 seconds and its from 32s to 138 so you can imagine how many data points I have. I've created a graph using plotly and matplotlib, just so that I have more libraries to work with to find a solution, so a solution in either library is ok! What I'm trying to do is make my script find the area under each peak and return my value.
def create_plot(sheet_name):
sample = book.sheet_by_name(sheet_name)
data = [[sample.cell_value(r, c) for r in range(sample.nrows)] for c in range(sample.ncols)]
y = data[2][18:len(data[2]) - 2]
x = np.arange(32, 138.05, 0.05)
indices = peakutils.indexes(y, thres=0.35, min_dist=0.1)
peaks = [y[i] for i in indices]
This snippet gets my Y values, X values and indices of the peaks. Now is there a way to get the area under each curve? Let's say that there are 15 indices.
Here's what the graph looks like:
An automated answer
Given a set of x and y values as well as a set of peaks (the x-coordinates of the peaks), here's how you can automatically find the area under each of the peaks. I'm assuming that x, y, and peaks are all Numpy arrays:
import numpy as np
# find the minima between each peak
ixpeak = x.searchsorted(peaks)
ixmin = np.array([np.argmin(i) for i in np.split(y, ixpeak)])
ixmin[1:] += ixpeak
mins = x[ixmin]
# split up the x and y values based on those minima
xsplit = np.split(x, ixmin[1:-1])
ysplit = np.split(y, ixmin[1:-1])
# find the areas under each peak
areas = [np.trapz(ys, xs) for xs,ys in zip(xsplit, ysplit)]
The example data has been set up so that the area under each peak is (more-or-less) guaranteed to be 1.0, so the results in the bottom plot are correct. The green X marks are the locations of the minimum between each two peaks. The part of the curve "belonging" to each peak is determined as the part of the curve in-between the minima adjacent to each peak.
Complete code
Here's the complete code I used to generate the example data:
import scipy as sp
import scipy.stats
prec = 1e5
n = 10
N = 150
r = np.arange(0, N+1, N//n)
# generate some reasonable fake data
peaks = np.array([np.random.uniform(s, e) for s,e in zip(r[:-1], r[1:])])
x = np.linspace(0, N + n, num=int(prec))
y = np.max([sp.stats.norm.pdf(x, loc=p, scale=.4) for p in peaks], axis=0)
and the code I used to make the plots:
import matplotlib.pyplot as plt
# plotting stuff
plt.plot(x, y, label='trace 0')
plt.plot(peaks, y[ixpeak], '+', c='red', ms=10, label='peaks')
plt.plot(mins, y[ixmin], 'x', c='green', ms=10, label='mins')
plt.title('Example data')
plt.ylim(-.1, 1.6)
plt.subplot(212), areas)
plt.xlabel('Peak number')
plt.ylabel('Area under peak')
plt.title('Area under the peaks of trace 0')

Find a easier way to cluster 2-d scatter data into grid array data

I have figured out a method to cluster disperse point data into structured 2-d array(like rasterize function). And I hope there are some better ways to achieve that target.
My work
1. Intro
1000 point data has there dimensions of properties (lon, lat, emission) whicn represent one factory located at (x,y) emit certain amount of CO2 into atmosphere
grid network: predefine the 2-d array in the shape of 20x20
The code reproduced here:
#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)
#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon,
map.scatter(px, py, color = "red", s= 5,zorder =3)
#### predefine the grid networks
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap = 'gray', facecolor = 'none',edgecolor = 'k',zorder=3)
2. My target
Finding the nearest grid point for each factory
Add the emission data into this grid number
3. Algorithm realization
3.1 Raster grid
note: 20x20 grid points are distributed in this area represented by blue dot.
3.2 KD-tree
Find the nearest blue dot of each red point
sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)
sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)
k = 0
for j in range(0,yy.shape[0],1):
for i in range(0,xx.shape[0],1):
grids[k] = np.array([lon_grid[i],lat_grid[j]])
T = KDTree(grids)
x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)
for i in range(0,len(df.lon),1):
idx = T.query_ball_point([df.lon.iloc[i],[i]], r=R)
# there are more than one blue dot which are founded sometimes,
# So I'll calculate the distances between the factory(red point)
# and all blue dots which are listed
if (idx > 1):
distance = []
for k in range(0,len(idx),1):
distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + ([i] - grids[k][1])**2))
pos_index = distance.index(min(distance))
pos = idx[pos_index]
# Only find 1 point
pos = idx
grids_em[pos] += df.so2[i]
4. Result
co2 = grids_em.reshape(20,20)
5. My question
Can someone point out some drawbacks or error of this method?
Is there some algorithms more aligned with my target?
Thanks a lot!
There are many for-loop in your code, it's not the numpy way.
Make some sample data first:
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl
xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105
N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)
df_points = pd.DataFrame({"x":x, "y":y, "v":value})
For equal space grids you can use hist2d():
pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");
Here is the output:
Here is the code to use KdTree:
X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]
grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]
tree = KDTree(grid)
dist, indices = tree.query(points)
grid_values = df_points.groupby(indices).v.sum()
df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values
fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v,
s=100, marker="o")
pl.colorbar(mapper, ax=ax);
the output is:

Referencing Data From a 2D Histogram

I have the following code that reads data from a CSV file and creates a 2D histogram:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
#Read in CSV data
filename = 'Complete_Storms_All_US_Only.csv'
df = pd.read_csv(filename)
min_85 = df.min85
min_37 = df.min37
verification = df.one_min_15
x = min_85
y = min_37
H = verification
#Estimate the 2D histogram
nbins = 33
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)
#Rotate and flip H
H = np.rot90(H)
H = np.flipud(H)
#Mask zeros
Hmasked =,H)
#Calculate Averages
avgarr = np.zeros((nbins, nbins))
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])
for xb, yb, v in zip(xbins, ybins, verification):
avgarr[yb, xb] += v
divisor = H.copy()
divisor[divisor==0.0] = np.nan
avgarr /= divisor
binavg = np.around((avgarr * 100), decimals=1)
binper =, mask=np.isnan(binavg))
#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.title('1 minute at +/- 0.15 degrees')
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()'Probability of CG Lightning (%)')
Each pixel in the histogram contains the probability of lightning for a given range of temperatures at two different frequencies on the x and y axis (min_85 on the x axis and min_37 on the y axis). I am trying to reference the probability of lightning from the histogram based on a wide range of temperatures that vary on an individual basis for any given storm. Each storm has a min_85 and min_37 that corresponds to a probability from the 2D histogram. I know there is a brute-force method where you can create a ridiculous amount of if statements, with one for each pixel, but this is tedious and inefficient when trying to incorporate over multiple 2D histograms. Is there a more efficient way to reference the probability from the histogram based on the given min_85 and min_37? I have a separate file with the min_85 and min_37 data for a large amount of storms, I just need to assign the corresponding probability of lightning from the histogram to each one.
It sounds like all you need to do is turn the min_85 and min_37 values into indices. Something like this will work:
# min85data and min37data from your file
dx = xedges[1] - xedges[0]
dy = yedges[1] - yedges[0]
min85inds = np.floor((min85data - yedges[1]) / dx).astype(
min37inds = np.floor((min37data - yedges[0]) / dy).astype(
# Pretend you didn't do all that flipping of H, or make a copy of it first
hvals = h_orig[min85inds, min37ends]
But do make sure that the resulting indices are valid before you extract them.

Plot periodic trajectories

I have some data of a particle moving in a corridor with closed boundary conditions.
Plotting the trajectory leads to a zig-zag trajectory.
I would like to know how to prevent plot() from connecting the points where the particle comes back to the start. Some thing like in the upper part of the pic, but without "."
The first idea I had was to find the index where the numpy array a[:-1]-a[1:] becomes positive and then plot from 0 to that index. But how would I get the index of the first occurrence of a positive element of a[:-1]-a[1:]?
Maybe there are some other ideas.
I'd go a different approach. First, I'd determine the jump points not by looking at the sign of the derivative, as probably the movement might go up or down, or even have some periodicity in it. I'd look at those points with the biggest derivative.
Second, an elegant approach to have breaks in a plot line is to mask one value on each jump. Then matplotlib will make segments automatically. My code is:
import pylab as plt
import numpy as np
xs = np.linspace(0., 100., 1000.)
data = (xs*0.03 + np.sin(xs) * 0.1) % 1
plt.plot(xs, data, "r-")
#Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(data))
mask = np.hstack([ abs_d_data > abs_d_data.mean()+3*abs_d_data.std(), [False]])
masked_data =, mask)
plt.plot(xs, masked_data, "b-")
And gives us as result:
The disadvantage of course is that you lose one point at each break - but with the sampling rate you seem to have I guess you can trade this in for simpler code.
To find where the particle has crossed the upper boundary, you can do something like this:
>>> import numpy as np
>>> a = np.linspace(0, 10, 50) % 5
>>> a = np.linspace(0, 10, 50) % 5 # some sample data
>>> np.nonzero(np.diff(a) < 0)[0] + 1
array([25, 49])
>>> a[24:27]
array([ 4.89795918, 0.10204082, 0.30612245])
>>> a[48:]
array([ 4.79591837, 0. ])
np.diff(a) calculates the discrete difference of a, while np.nonzero finds where the condition np.diff(a) < 0 is negative, i.e., the particle has moved downward.
To avoid the connecting line you will have to plot by segments.
Here's a quick way to plot by segments when the derivative of a changes sign:
import numpy as np
a = np.linspace(0, 20, 50) % 5 # similar to Micheal's sample data
x = np.arange(50) # x scale
indices = np.where(np.diff(a) < 0)[0] + 1 # the same as Micheal's np.nonzero
for n, i in enumerate(indices):
if n == 0:
plot(x[:i], a[:i], 'b-')
plot(x[indices[n - 1]:i], a[indices[n - 1]:i], 'b-')
Based on Thorsten Kranz answer a version which adds points to the original data when the 'y' crosses the period. This is important if the density of data-points isn't very high, e.g. np.linspace(0., 100., 100) vs. the original np.linspace(0., 100., 1000). The x position of the curve transitions are linear interpolated. Wrapped up in a function its:
import numpy as np
def periodic2plot(x, y, period=np.pi*2.):
indexes = np.argwhere(np.abs(np.diff(y))>.5*period).flatten()
index_shift = 0
for i in indexes:
i += index_shift
index_shift += 3 # in every loop it adds 3 elements
if y[i] > .5*period:
x_transit = np.interp(period, np.unwrap(y[i:i+2], period=period), x[i:i+2])
add =[ period, 0., 0.], mask=[0,1,0])
# interpolate needs sorted xp = np.unwrap(y[i:i+2], period=period)
x_transit = np.interp(0, np.unwrap(y[i:i+2], period=period)[::-1], x[i:i+2][::-1])
add =[ 0., 0., period], mask=[0,1,0])
x_add =[x_transit]*3, mask=[0,1,0])
x =[:i+1], x_add, x[i+1:]))
y =[:i+1], add, y[i+1:]))
return x, y
The code for comparison to the original answer of Thorsten Kranz with lower data-points density.
import matplotlib.pyplot as plt
x = np.linspace(0., 100., 100)
y = (x*0.03 + np.sin(x) * 0.1) % 1
#Thorsten Kranz: Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(y))
mask = np.hstack([np.abs(np.diff(y))>.5, [False]])
masked_y =, mask)
# Plot
plt.plot(*periodic2plot(x, y, period=1), label='This answer')
plt.plot(x, masked_y, label='Thorsten Kranz')
plt.autoscale(enable=True, axis='both', tight=True)

