get bins coordinates with hexbin in matplotlib

get bins coordinates with hexbin in matplotlib - python

I use matplotlib's method hexbin to compute 2d histograms on my data.
But I would like to get the coordinates of the centers of the hexagons in order to further process the results.
I got the values using get_array() method on the result, but I cannot figure out how to get the bins coordinates.
I tried to compute them given number of bins and the extent of my data but i don't know the exact number of bins in each direction. gridsize=(10,2) should do the trick but it does not seem to work.
Any idea?

I think this works.
from __future__ import division
import numpy as np
import math
import matplotlib.pyplot as plt
def generate_data(n):
"""Make random, correlated x & y arrays"""
points = np.random.multivariate_normal(mean=(0,0),
cov=[[0.4,9],[9,10]],size=int(n))
return points
if __name__ =='__main__':
color_map = plt.cm.Spectral_r
n = 1e4
points = generate_data(n)
xbnds = np.array([-20.0,20.0])
ybnds = np.array([-20.0,20.0])
extent = [xbnds[0],xbnds[1],ybnds[0],ybnds[1]]
fig=plt.figure(figsize=(10,9))
ax = fig.add_subplot(111)
x, y = points.T
# Set gridsize just to make them visually large
image = plt.hexbin(x,y,cmap=color_map,gridsize=20,extent=extent,mincnt=1,bins='log')
# Note that mincnt=1 adds 1 to each count
counts = image.get_array()
ncnts = np.count_nonzero(np.power(10,counts))
verts = image.get_offsets()
for offc in xrange(verts.shape[0]):
binx,biny = verts[offc][0],verts[offc][1]
if counts[offc]:
plt.plot(binx,biny,'k.',zorder=100)
ax.set_xlim(xbnds)
ax.set_ylim(ybnds)
plt.grid(True)
cb = plt.colorbar(image,spacing='uniform',extend='max')
plt.show()

I would love to confirm that the code by Hooked using get_offsets() works, but I tried several iterations of the code mentioned above to retrieve center positions and, as Dave mentioned, get_offsets() remains empty. The workaround that I found is to use the non-empty 'image.get_paths()' option. My code takes the mean to find centers but which means it is just a smidge longer, but it does work.
The get_paths() option returns a set of x,y coordinates embedded that can be looped over and then averaged to return the center position for each hexagram.
The code that I have is as follows:
counts=image.get_array() #counts in each hexagon, works great
verts=image.get_offsets() #empty, don't use this
b=image.get_paths() #this does work, gives Path([[]][]) which can be plotted
for x in xrange(len(b)):
xav=np.mean(b[x].vertices[0:6,0]) #center in x (RA)
yav=np.mean(b[x].vertices[0:6,1]) #center in y (DEC)
plt.plot(xav,yav,'k.',zorder=100)

I had this same problem. I think what needs to be developed is a framework to have a HexagonalGrid object which can then be applied to many different data sets (and it would be awesome to do it for N dimensions). This is possible and it surprises me that neither Scipy or Numpy has anything for it (furthermore there seems to be nothing else like it except perhaps binify)
That said, I assume you want to use hexbinning to compare multiple binned data sets. This requires some common base. I got this to work using matplotlib's hexbin the following way:
import numpy as np
import matplotlib.pyplot as plt
def get_data (mean,cov,n=1e3):
"""
Quick fake data builder
"""
np.random.seed(101)
points = np.random.multivariate_normal(mean=mean,cov=cov,size=int(n))
x, y = points.T
return x,y
def get_centers (hexbin_output):
"""
about 40% faster than previous post only cause you're not calculating the
min/max every time
"""
paths = hexbin_output.get_paths()
v = paths[0].vertices[:-1] # adds a value [0,0] to the end
vx,vy = v.T
idx = [3,0,5,2] # index for [xmin,xmax,ymin,ymax]
xmin,xmax,ymin,ymax = vx[idx[0]],vx[idx[1]],vy[idx[2]],vy[idx[3]]
half_width_x = abs(xmax-xmin)/2.0
half_width_y = abs(ymax-ymin)/2.0
centers = []
for i in xrange(len(paths)):
cx = paths[i].vertices[idx[0],0]+half_width_x
cy = paths[i].vertices[idx[2],1]+half_width_y
centers.append((cx,cy))
return np.asarray(centers)
# important parts ==>
class Hexagonal2DGrid (object):
"""
Used to fix the gridsize, extent, and bins
"""
def __init__ (self,gridsize,extent,bins=None):
self.gridsize = gridsize
self.extent = extent
self.bins = bins
def hexbin (x,y,hexgrid):
"""
To hexagonally bin the data in 2 dimensions
"""
fig = plt.figure()
ax = fig.add_subplot(111)
# Note mincnt=0 so that it will return a value for every point in the
# hexgrid, not just those with count>mincnt
# Basically you fix the gridsize, extent, and bins to keep them the same
# then the resulting count array is the same
hexbin = plt.hexbin(x,y, mincnt=0,
gridsize=hexgrid.gridsize,
extent=hexgrid.extent,
bins=hexgrid.bins)
# you could close the figure if you don't want it
# plt.close(fig.number)
counts = hexbin.get_array().copy()
return counts, hexbin
# Example ===>
if __name__ == "__main__":
hexgrid = Hexagonal2DGrid((21,5),[-70,70,-20,20])
x_data,y_data = get_data((0,0),[[-40,95],[90,10]])
x_model,y_model = get_data((0,10),[[100,30],[3,30]])
counts_data, hexbin_data = hexbin(x_data,y_data,hexgrid)
counts_model, hexbin_model = hexbin(x_model,y_model,hexgrid)
# if you want the centers, they will be the same for both
centers = get_centers(hexbin_data)
# if you want to ignore the cells with zeros then use the following mask.
# But if want zeros for some bins and not others I'm not sure an elegant way
# to do this without using the centers
nonzero = counts_data != 0
# now you can compare the two data sets
variance_data = counts_data[nonzero]
square_diffs = (counts_data[nonzero]-counts_model[nonzero])**2
chi2 = np.sum(square_diffs/variance_data)
print(" chi2={}".format(chi2))

Related

Finding two linear fits on different domains in the same data

I'm trying to plot a 3rd-order polynomial, and two linear fits on the same set of data. My data looks like this:
,Frequency,Flux Density,log_freq,log_flux
0,1.25e+18,1.86e-07,18.096910013008056,-6.730487055782084
1,699000000000000.0,1.07e-06,14.84447717574568,-5.97061622231479
2,541000000000000.0,1.1e-06,14.73319726510657,-5.958607314841775
3,468000000000000.0,1e-06,14.670245853074125,-6.0
4,458000000000000.0,1.77e-06,14.660865478003869,-5.752026733638194
5,89400000000000.0,3.01e-05,13.951337518795917,-4.521433504406157
6,89400000000000.0,9.3e-05,13.951337518795917,-4.031517051446065
7,89400000000000.0,0.00187,13.951337518795917,-2.728158393463501
8,65100000000000.0,2.44e-05,13.813580988568193,-4.61261017366127
9,65100000000000.0,6.28e-05,13.813580988568193,-4.2020403562628035
10,65100000000000.0,0.00108,13.813580988568193,-2.96657624451305
11,25900000000000.0,0.000785,13.413299764081252,-3.1051303432547472
12,25900000000000.0,0.00106,13.413299764081252,-2.9746941347352296
13,25900000000000.0,0.000796,13.413299764081252,-3.099086932262331
14,13600000000000.0,0.00339,13.133538908370218,-2.469800301796918
15,13600000000000.0,0.00372,13.133538908370218,-2.4294570601181023
16,13600000000000.0,0.00308,13.133538908370218,-2.5114492834995557
17,12700000000000.0,0.00222,13.103803720955957,-2.653647025549361
18,12700000000000.0,0.00204,13.103803720955957,-2.6903698325741012
19,230000000000.0,0.133,11.361727836017593,-0.8761483590329142
22,90000000000.0,0.518,10.954242509439325,-0.28567024025476695
23,61000000000.0,1.0,10.785329835010767,0.0
24,61000000000.0,0.1,10.785329835010767,-1.0
25,61000000000.0,0.4,10.785329835010767,-0.3979400086720376
26,42400000000.0,0.8,10.627365856592732,-0.09691001300805639
27,41000000000.0,0.9,10.612783856719735,-0.045757490560675115
28,41000000000.0,0.7,10.612783856719735,-0.1549019599857432
29,41000000000.0,0.8,10.612783856719735,-0.09691001300805639
30,41000000000.0,0.6,10.612783856719735,-0.2218487496163564
31,41000000000.0,0.7,10.612783856719735,-0.1549019599857432
32,37000000000.0,1.0,10.568201724066995,0.0
33,36800000000.0,1.0,10.565847818673518,0.0
34,36800000000.0,0.98,10.565847818673518,-0.00877392430750515
35,33000000000.0,0.8,10.518513939877888,-0.09691001300805639
36,33000000000.0,1.0,10.518513939877888,0.0
37,31400000000.0,0.92,10.496929648073214,-0.036212172654444715
38,23000000000.0,1.4,10.361727836017593,0.146128035678238
39,23000000000.0,1.1,10.361727836017593,0.04139268515822508
40,23000000000.0,1.11,10.361727836017593,0.045322978786657475
41,23000000000.0,1.1,10.361727836017593,0.04139268515822508
42,22200000000.0,1.23,10.346352974450639,0.08990511143939793
43,22200000000.0,1.24,10.346352974450639,0.09342168516223506
44,21700000000.0,0.98,10.33645973384853,-0.00877392430750515
45,21700000000.0,1.07,10.33645973384853,0.029383777685209667
46,20000000000.0,1.44,10.301029995663981,0.15836249209524964
47,15400000000.0,1.32,10.187520720836464,0.12057393120584989
48,15000000000.0,1.5,10.176091259055681,0.17609125905568124
49,15000000000.0,1.5,10.176091259055681,0.17609125905568124
50,15000000000.0,1.42,10.176091259055681,0.15228834438305647
51,15000000000.0,1.43,10.176091259055681,0.1553360374650618
52,15000000000.0,1.42,10.176091259055681,0.15228834438305647
53,15000000000.0,1.47,10.176091259055681,0.1673173347481761
54,15000000000.0,1.38,10.176091259055681,0.13987908640123647
55,10700000000.0,2.59,10.02938377768521,0.4132997640812518
56,8870000000.0,2.79,9.947923619831727,0.44560420327359757
57,8460000000.0,2.69,9.927370363039023,0.42975228000240795
58,8400000000.0,2.8,9.924279286061882,0.4471580313422192
59,8400000000.0,2.53,9.924279286061882,0.40312052117581787
60,8400000000.0,2.06,9.924279286061882,0.31386722036915343
61,8300000000.0,2.58,9.919078092376074,0.41161970596323016
62,8080000000.0,2.76,9.907411360774587,0.4409090820652177
63,5010000000.0,3.68,9.699837725867246,0.5658478186735176
64,5000000000.0,0.81,9.698970004336019,-0.09151498112135022
65,5000000000.0,3.5,9.698970004336019,0.5440680443502757
66,5000000000.0,3.57,9.698970004336019,0.5526682161121932
67,4980000000.0,3.46,9.697229342759718,0.5390760987927766
68,4900000000.0,2.95,9.690196080028514,0.46982201597816303
69,4850000000.0,3.46,9.685741738602264,0.5390760987927766
70,4850000000.0,3.45,9.685741738602264,0.5378190950732742
71,4780000000.0,2.16,9.679427896612118,0.3344537511509309
72,4540000000.0,3.61,9.657055852857104,0.557507201905658
73,2700000000.0,3.5,9.431363764158988,0.5440680443502757
74,2700000000.0,3.7,9.431363764158988,0.568201724066995
75,2700000000.0,3.92,9.431363764158988,0.5932860670204573
76,2700000000.0,3.92,9.431363764158988,0.5932860670204573
77,2250000000.0,4.21,9.352182518111363,0.6242820958356683
78,1660000000.0,3.69,9.220108088040055,0.5670263661590603
79,1660000000.0,3.8,9.220108088040055,0.5797835966168101
80,1410000000.0,3.5,9.14921911265538,0.5440680443502757
81,1400000000.0,3.45,9.146128035678238,0.5378190950732742
82,1400000000.0,3.28,9.146128035678238,0.5158738437116791
83,1400000000.0,3.19,9.146128035678238,0.5037906830571811
84,1400000000.0,3.51,9.146128035678238,0.5453071164658241
85,1340000000.0,3.31,9.127104798364808,0.5198279937757188
86,1340000000.0,3.31,9.127104798364808,0.5198279937757188
87,750000000.0,3.14,8.8750612633917,0.49692964807321494
88,408000000.0,1.46,8.61066016308988,0.1643528557844371
89,408000000.0,1.46,8.61066016308988,0.1643528557844371
90,365000000.0,1.62,8.562292864456476,0.20951501454263097
91,365000000.0,1.56,8.562292864456476,0.1931245983544616
92,333000000.0,1.32,8.52244423350632,0.12057393120584989
93,302000000.0,1.23,8.48000694295715,0.08990511143939793
94,151000000.0,2.13,8.178976947293169,0.3283796034387377
95,73800000.0,3.58,7.868056361823042,0.5538830266438743
and my code is
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy.polynomial.polynomial as poly
def find_extrema(poly, bounds):
'''
Finds the extrema of the polynomial; ensure real.
https://stackoverflow.com/questions/72932816/python-finding-local-maxima-minima-for-multiple-polynomials-efficiently
'''
deriv = poly.deriv()
extrema = deriv.roots()
# Filter out complex roots
extrema = extrema[np.isreal(extrema)]
# Get real part of root
extrema = np.real(extrema)
# Apply bounds check
lb, ub = bounds
extrema = extrema[(lb <= extrema) & (extrema <= ub)]
return extrema
def find_maximum(poly, bounds):
'''
Find the maximum point; returns the value of the turnover frequency.
https://stackoverflow.com/questions/72932816/python-finding-local-maxima-minima-for-multiple-polynomials-efficiently
'''
extrema = find_extrema(poly, bounds)
# Either bound could end up being the minimum. Check those too.
extrema = np.concatenate((extrema, bounds))
value_at_extrema = poly(extrema)
maximum_index = np.argmax(value_at_extrema)
return extrema[maximum_index]
# LOAD THE DATA FROM FILE HERE
# CARRY ON...
xvar = 'log_freq'
yvar = 'log_flux'
x, y = pks[xvar], pks[yvar]
lower = min(x)
upper = max(x)
# Find the 3rd-order polynomial which fits the SED
coefs = poly.polyfit(x, y, 3) # find the coeffs
x_new = np.linspace(lower, upper, num=len(x)*10) # space to plot the fit
ffit = poly.Polynomial(coefs) # find the polynomial
# Find turnover frequency and peak flux
nu_to = find_maximum(ffit, (lower, upper))
F_p = ffit(nu_to)
# HERE'S THE TRICKY BIT
# Find the straight line to fit to the left of nu_to
left_linefit = poly.polyfit(x, y, 1)
x_left = np.linspace(lower, nu_to, num=len(x)*10) # space to plot the fit
ffit_thin = poly.Polynomial(left_linefit,
domain = (lower, nu_to)
)
# PLOTS THE POLYNOMIAL WELL
ax1 = plt.subplot(1, 1, 1)
ax1.scatter(pks[xvar], pks[yvar], label = 'PKS 0742+10', c = 'b')
ax1.plot(x_new, ffit(x_new), color = 'r')
ax1.plot(x_left, ffit_left(x_left), color = 'gold')
ax1.set_yscale('linear')
ax1.set_xscale('linear')
ax1.legend()
ax1.set_xlabel(r'$\log\nu$ ($\nu$ in Hz)')
ax1.set_ylabel(r'$\log F_{\nu}$ ($F_{\nu}$ in Jy)')
ax1.grid(axis = 'both', which = 'major')
The code produces the poly fit well:
I'm trying to plot the straight-line fits for the points on either side of the maximum, as shown schematically below:
I thought I could do it with
ffit_left = poly.Polynomial(left_linefit,
domain = (lower, nu_to)
)
and similar for ffit_right, but that produces
which is actually the straight-line fit for the whole dataset, plotted only for that domain. I don't want to manipulate the dataset, because eventually I'll have to do it on a lot of datasets.
The fitting part of the code comes from an answer to this question .
How can I fit a straight line to just set of points without manipulating the dataset?
My guess is that I have to make left_linefit = poly.polyfit(x, y, 1) recognise a domain, but I can't see anything in the numpy polyfit docs.
Sorry for the long question!

I am not sure to well understand your request. If you want to fit a piecewise function made of three linear segments a method is described in https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf with theory and numerical examples.
Several cases are considered. Among them the case below might be convenient for you.
H(*) is the Heaviside step function.

Number of arrowheads on matplotlib streamplot

Is there anyway to increase the number of arrowheads on a matplotlib streamplot? Right now it appears as if three is only one arrowhead per streamline, which is a problem if I want to change to x/y axes limits to zoom in on the data.

Building on #Richard_wth's answer, I wrote a function to provide control on the location of the arrows on a streamplot. One can choose n arrows per streamline, or choose to have the arrows equally spaced on a streamline.
First, you do a normal streamplot, until you are happy with the location and number of streamlines. You keep the returned argument sp. For instance:
sp = ax.streamplot(x,y,u,v,arrowstyle='-',density=10)
What's important here is to have arrowstyle='-' so that arrows are not displayed.
Then, you can call the function streamQuiver (provided below) to control the arrows on the each streamline. If you want 3 arrows per streamline:
streamQuiver(ax, sp, n=3, ...)
If you want a streamline every 1.5 curvilinear length:
streamQuiver(ax, sp, spacing=1.5, ...)
where ... are options that would be passed to quiver.
The function streamQuiver is probably not fully bulletproof and may need some additional handling for particular cases. It relies on 4 subfunctions:
curve_coord to get the curvilinear length along a path
curve extract to extract equidistant point along a path
seg_to_lines to convert the segments from streamplot into continuous lines. There might be a better way to do that!
lines_to_arrows: this is the main function that extract arrows on each lines
Here's an example where the arrows are at equidistant points on each streamlines.
import numpy as np
import matplotlib.pyplot as plt
def streamQuiver(ax,sp,*args,spacing=None,n=5,**kwargs):
""" Plot arrows from streamplot data
The number of arrows per streamline is controlled either by `spacing` or by `n`.
See `lines_to_arrows`.
"""
def curve_coord(line=None):
""" return curvilinear coordinate """
x=line[:,0]
y=line[:,1]
s = np.zeros(x.shape)
s[1:] = np.sqrt((x[1:]-x[0:-1])**2+ (y[1:]-y[0:-1])**2)
s = np.cumsum(s)
return s
def curve_extract(line,spacing,offset=None):
""" Extract points at equidistant space along a curve"""
x=line[:,0]
y=line[:,1]
if offset is None:
offset=spacing/2
# Computing curvilinear length
s = curve_coord(line)
offset=np.mod(offset,s[-1]) # making sure we always get one point
# New (equidistant) curvilinear coordinate
sExtract=np.arange(offset,s[-1],spacing)
# Interpolating based on new curvilinear coordinate
xx=np.interp(sExtract,s,x);
yy=np.interp(sExtract,s,y);
return np.array([xx,yy]).T
def seg_to_lines(seg):
""" Convert a list of segments to a list of lines """
def extract_continuous(i):
x=[]
y=[]
# Special case, we have only 1 segment remaining:
if i==len(seg)-1:
x.append(seg[i][0,0])
y.append(seg[i][0,1])
x.append(seg[i][1,0])
y.append(seg[i][1,1])
return i,x,y
# Looping on continuous segment
while i<len(seg)-1:
# Adding our start point
x.append(seg[i][0,0])
y.append(seg[i][0,1])
# Checking whether next segment continues our line
Continuous= all(seg[i][1,:]==seg[i+1][0,:])
if not Continuous:
# We add our end point then
x.append(seg[i][1,0])
y.append(seg[i][1,1])
break
elif i==len(seg)-2:
# we add the last segment
x.append(seg[i+1][0,0])
y.append(seg[i+1][0,1])
x.append(seg[i+1][1,0])
y.append(seg[i+1][1,1])
i=i+1
return i,x,y
lines=[]
i=0
while i<len(seg):
iEnd,x,y=extract_continuous(i)
lines.append(np.array( [x,y] ).T)
i=iEnd+1
return lines
def lines_to_arrows(lines,n=5,spacing=None,normalize=True):
""" Extract "streamlines" arrows from a set of lines
Either: `n` arrows per line
or an arrow every `spacing` distance
If `normalize` is true, the arrows have a unit length
"""
if spacing is None:
# if n is provided we estimate the spacing based on each curve lenght)
spacing = [ curve_coord(l)[-1]/n for l in lines]
try:
len(spacing)
except:
spacing=[spacing]*len(lines)
lines_s=[curve_extract(l,spacing=sp,offset=sp/2) for l,sp in zip(lines,spacing)]
lines_e=[curve_extract(l,spacing=sp,offset=sp/2+0.01*sp) for l,sp in zip(lines,spacing)]
arrow_x = [l[i,0] for l in lines_s for i in range(len(l))]
arrow_y = [l[i,1] for l in lines_s for i in range(len(l))]
arrow_dx = [le[i,0]-ls[i,0] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
arrow_dy = [le[i,1]-ls[i,1] for ls,le in zip(lines_s,lines_e) for i in range(len(ls))]
if normalize:
dn = [ np.sqrt(ddx**2 + ddy**2) for ddx,ddy in zip(arrow_dx,arrow_dy)]
arrow_dx = [ddx/ddn for ddx,ddn in zip(arrow_dx,dn)]
arrow_dy = [ddy/ddn for ddy,ddn in zip(arrow_dy,dn)]
return arrow_x,arrow_y,arrow_dx,arrow_dy
# --- Main body of streamQuiver
# Extracting lines
seg = sp.lines.get_segments() # list of (2, 2) numpy arrays
lines = seg_to_lines(seg) # list of (N,2) numpy arrays
# Convert lines to arrows
ar_x, ar_y, ar_dx, ar_dy = lines_to_arrows(lines,spacing=spacing,n=n,normalize=True)
# Plot arrows
qv=ax.quiver(ar_x, ar_y, ar_dx, ar_dy, *args, angles='xy', **kwargs)
return qv
# --- Example
x = np.linspace(-1,1,100)
y = np.linspace(-1,1,100)
X,Y=np.meshgrid(x,y)
u = -np.sin(np.arctan2(Y,X))
v = np.cos(np.arctan2(Y,X))
xseed=np.linspace(0.1,1,4)
fig=plt.figure()
ax=fig.add_subplot(111)
sp = ax.streamplot(x,y,u,v,color='k',arrowstyle='-',start_points=np.array([xseed,xseed*0]).T,density=30)
qv = streamQuiver(ax,sp,spacing=0.5, scale=60)
plt.show()

I'm not sure about just increasing the number of arrowheads - but you can increase the density of streamlines with the density parameter in the streamplot function, here's the documentation:
*density* : float or 2-tuple
Controls the closeness of streamlines. When `density = 1`, the domain
is divided into a 30x30 grid---*density* linearly scales this grid.
Each cell in the grid can have, at most, one traversing streamline.
For different densities in each direction, use [density_x, density_y].
Here is an example:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,20,1)
y = np.arange(0,20,1)
u=np.random.random((x.shape[0], y.shape[0]))
v=np.random.random((x.shape[0], y.shape[0]))
fig, ax = plt.subplots(2,2)
ax[0,0].streamplot(x,y,u,v,density=1)
ax[0,0].set_title('Original')
ax[0,1].streamplot(x,y,u,v,density=4)
ax[0,1].set_xlim(5,10)
ax[0,1].set_ylim(5,10)
ax[0,1].set_title('Zoomed, higher density')
ax[1,1].streamplot(x,y,u,v,density=1)
ax[1,1].set_xlim(5,10)
ax[1,1].set_ylim(5,10)
ax[1,1].set_title('Zoomed, same density')
ax[1,0].streamplot(x,y,u,v,density=4)
ax[1,0].set_title('Original, higher density')
fig.show()

I have found a way to customize the number of arrowheads on streamline plot.
The idea is to plot streamline and arrows separately:
plt.streamplot returns a stream_container with two attributes: lines and arrows. The lines contain line segments that can be used to reconstruct streamline without arrows.
plt.quiver can be used to plot gradient fields. With the proper scaling, the length of the arrows is neglectable, leaving only arrowheads.
Thus, we only need to define the positions of arrows using the line segments and pass them to plt.quiver.
Here is a toy example:
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import numpy as np
# get line segments
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sp = ax.streamplot(x, y, u, v, start_points=start_points, density=10)
seg = sps.lines.get_segments() # seg is a list of (2, 2) numpy arrays
lc = mc.LineCollection(seg, ...)
# define arrows
# here I define one arrow every 50 segments
# you could also select segs based on some criterion, e.g. intersect with certain lines
period = 50
arrow_x = np.array([seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_y = np.array([seg[i][0, 1] for i in range(0, len(seg), period)])
arrow_dx = np.array([seg[i][1, 0] - seg[i][0, 0] for i in range(0, len(seg), period)])
arrow_dy = np.array([seg[i][1, 1] - seg[i][0, 1] for i in range(0, len(seg), period)])
# plot the final streamline
fig = plt.figure(figsize=(12.8, 10.8))
ax = fig.add_subplot(1, 1, 1)
ax.add_collection(lc)
ax.autoscale()
ax.quiver(
arrow_x, arrow_y, arrow_dx, arrow_dy, angles='xy', # arrow position
scale=0.2, scale_units='inches', units='y', minshaft=0, # arrow scaling
headwidth=6, headlength=10, headaxislength=9) # arrow style
fig.show()
There is more than one way to scale the arrows so that they appear to have zero length.

Transformations in matplotlib

I'm trying to draw objects (lines/patches) with a fixed size (in device coordinates) at a certain position (in data coordinates). This behavior is akin to markers and the tips of annotation arrows, both of which are (size-) invariant under zoom and pan.
Why does the following example not work as expected?
The expected output is two crossed lines forming the diagonals of a 50x50 point square (device coordinates). The left lower corner of said square should be at point (1,0) in data coordinates.
While the computed points appear to be correct, the second diagonal is simply not visible.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.path as mpath
import matplotlib.transforms as mtrans
import matplotlib as mpl
import numpy as np
class FixedPointOffsetTransform(mtrans.Transform):
"""
Always returns the same transformed point plus
the given point in device coordinates as an offset.
"""
def __init__(self, trans, fixed_point):
mtrans.Transform.__init__(self)
self.input_dims = self.output_dims = 2
self.has_inverse = False
self.trans = trans
self.fixed_point = np.array(fixed_point).reshape(1, 2)
def transform(self, values):
fp = self.trans.transform(self.fixed_point)
values = np.array(values)
if values.ndim == 1:
return fp.flatten() + values
else:
return fp + values
fig , ax = plt.subplots(1,1)
ax.set_xlim([-1,10])
ax.set_ylim([-1,10])
# this transformation shifts the input by the given offset
# the offset is transformed with the given transformation
# and then added to the input
fixed_pt_trans = FixedPointOffsetTransform(ax.transData, (1, 0))
# these values are in device coordinates i.e. points
height = 50
width = 50
# two points in device coordinates, that are modified with the above transformation
A = fixed_pt_trans.transform((0,0))
B = fixed_pt_trans.transform((width,height))
l1 = mpl.lines.Line2D([A[0],B[0]], [A[1],B[1]])
ax.add_line(l1)
# already in device coordinates with the offset applied,
# no further transformation nessesary
l1.set_transform(None)
print(A)
print(B)
print(l1.get_transform().transform(A))
print(l1.get_transform().transform(B))
# two points in device coordinates (unmodified)
A = (width,0)
B = (0,height)
l2 = mpl.lines.Line2D([A[0],B[0]], [A[1],B[1]])
ax.add_line(l2)
# apply transformation to add offset
l2.set_transform(fixed_pt_trans)
print(l2.get_transform().transform(A))
print(l2.get_transform().transform(B))
fig.show()

According to matplotlib API Changes documentation, starting with matplotlib 1.2.x:
Transform subclassing behaviour is now subtly changed. If your transform implements a non-affine transformation, then it should override the transform_non_affine method, rather than the generic transform method.
Therefore, simply reimplementing the transform_non_affine instead of the transform method, as said above, in the FixedPointOffsetTransform class seems to solve the issue:
class FixedPointOffsetTransform(mtrans.Transform):
"""
Always returns the same transformed point plus
the given point in device coordinates as an offset.
"""
def __init__(self, trans, fixed_point):
mtrans.Transform.__init__(self)
self.input_dims = self.output_dims = 2
self.has_inverse = False
self.trans = trans
self.fixed_point = np.array(fixed_point).reshape(1, 2)
def transform_non_affine(self, values):
fp = self.trans.transform(self.fixed_point)
values = np.array(values)
if values.ndim == 1:
return fp.flatten() + values
else:
return fp + values

Remove data points below a curve with python

I need to compare some theoretical data with real data in python.
The theoretical data comes from resolving an equation.
To improve the comparative I would like to remove data points that fall far from the theoretical curve. I mean, I want to remove the points below and above red dashed lines in the figure (made with matplotlib).
Both the theoretical curves and the data points are arrays of different length.
I can try to remove the points in a roughly-eye way, for example: the first upper point can be detected using:
data2[(data2.redshift<0.4)&data2.dmodulus>1]
rec.array([('1997o', 0.374, 1.0203223485103787, 0.44354759972859786)], dtype=[('SN_name', '|S10'), ('redshift', '<f8'), ('dmodulus', '<f8'), ('dmodulus_error', '<f8')])
But I would like to use a less roughly-eye way.
So, can anyone help me finding an easy way of removing the problematic points?
Thank you!

This might be overkill and is based on your comment
Both the theoretical curves and the data points are arrays of
different length.
I would do the following:
Truncate the data set so that its x values lie within the max and min values of the theoretical set.
Interpolate the theoretical curve using scipy.interpolate.interp1d and the above truncated data x values. The reason for step (1) is to satisfy the constraints of interp1d.
Use numpy.where to find data y values that are out side the range of acceptable theory values.
DONT discard these values, as was suggested in comments and other answers. If you want for clarity, point them out by plotting the 'inliners' one color and the 'outliers' an other color.
Here's a script that is close to what you are looking for, I think. It hopefully will help you accomplish what you want:
import numpy as np
import scipy.interpolate as interpolate
import matplotlib.pyplot as plt
# make up data
def makeUpData():
'''Make many more data points (x,y,yerr) than theory (x,y),
with theory yerr corresponding to a constant "sigma" in y,
about x,y value'''
NX= 150
dataX = (np.random.rand(NX)*1.1)**2
dataY = (1.5*dataX+np.random.rand(NX)**2)*dataX
dataErr = np.random.rand(NX)*dataX*1.3
theoryX = np.arange(0,1,0.1)
theoryY = theoryX*theoryX*1.5
theoryErr = 0.5
return dataX,dataY,dataErr,theoryX,theoryY,theoryErr
def makeSameXrange(theoryX,dataX,dataY):
'''
Truncate the dataX and dataY ranges so that dataX min and max are with in
the max and min of theoryX.
'''
minT,maxT = theoryX.min(),theoryX.max()
goodIdxMax = np.where(dataX<maxT)
goodIdxMin = np.where(dataX[goodIdxMax]>minT)
return (dataX[goodIdxMax])[goodIdxMin],(dataY[goodIdxMax])[goodIdxMin]
# take 'theory' and get values at every 'data' x point
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated thoeryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
# collect valid points
def findInlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY<(interpTheoryY+theoryErr))
withinLower = np.where(dataY[withinUpper]
>(interpTheoryY[withinUpper]-theoryErr))
return (dataX[withinUpper])[withinLower],(dataY[withinUpper])[withinLower]
def findOutlierSet(dataX,dataY,interpTheoryY,thoeryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
withinUpper = np.where(dataY>(interpTheoryY+theoryErr))
withinLower = np.where(dataY<(interpTheoryY-theoryErr))
return (dataX[withinUpper],dataY[withinUpper],
dataX[withinLower],dataY[withinLower])
if __name__ == "__main__":
dataX,dataY,dataErr,theoryX,theoryY,theoryErr = makeUpData()
TruncDataX,TruncDataY = makeSameXrange(theoryX,dataX,dataY)
interpTheoryY = theoryYatDataX(theoryX,theoryY,TruncDataX)
inDataX,inDataY = findInlierSet(TruncDataX,TruncDataY,interpTheoryY,
theoryErr)
outUpX,outUpY,outDownX,outDownY = findOutlierSet(TruncDataX,
TruncDataY,
interpTheoryY,
theoryErr)
#print inlierIndex
fig = plt.figure()
ax = fig.add_subplot(211)
ax.errorbar(dataX,dataY,dataErr,fmt='.',color='k')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
ax = fig.add_subplot(212)
ax.plot(inDataX,inDataY,'ko')
ax.plot(outUpX,outUpY,'bo')
ax.plot(outDownX,outDownY,'ro')
ax.plot(theoryX,theoryY,'r-')
ax.plot(theoryX,theoryY+theoryErr,'r--')
ax.plot(theoryX,theoryY-theoryErr,'r--')
ax.set_xlim(0,1.4)
ax.set_ylim(-.5,3)
fig.savefig('findInliers.png')
This figure is the result:

At the end I use some of the Yann code:
def theoryYatDataX(theoryX,theoryY,dataX):
'''For every dataX point, find interpolated theoryY value. theoryx needed
for interpolation.'''
f = interpolate.interp1d(theoryX,theoryY)
return f(dataX[np.where(dataX<np.max(theoryX))])
def findOutlierSet(data,interpTheoryY,theoryErr):
'''Find where theoryY-theoryErr < dataY theoryY+theoryErr and return
valid indicies.'''
up = np.where(data.dmodulus > (interpTheoryY+theoryErr))
low = np.where(data.dmodulus < (interpTheoryY-theoryErr))
# join all the index together in a flat array
out = np.hstack([up,low]).ravel()
index = np.array(np.ones(len(data),dtype=bool))
index[out]=False
datain = data[index]
dataout = data[out]
return datain, dataout
def selectdata(data,theoryX,theoryY):
"""
Data selection: z<1 and +-0.5 LFLRW separation
"""
# Select data with redshift z<1
data1 = data[data.redshift < 1]
# From modulus to light distance:
data1.dmodulus, data1.dmodulus_error = modulus2distance(data1.dmodulus,data1.dmodulus_error)
# redshift data order
data1.sort(order='redshift')
# Outliers: distance to LFLRW curve bigger than +-0.5
theoryErr = 0.5
# Theory curve Interpolation to get the same points as data
interpy = theoryYatDataX(theoryX,theoryY,data1.redshift)
datain, dataout = findOutlierSet(data1,interpy,theoryErr)
return datain, dataout
Using those functions I can finally obtain:
Thank you all for your help.

Just look at the difference between the red curve and the points, if it is bigger than the difference between the red curve and the dashed red curve remove it.
diff=np.abs(points-red_curve)
index= (diff>(dashed_curve-redcurve))
filtered=points[index]
But please take the comment from NickLH serious. Your Data looks pretty good without any filtering, your "outlieres" all have a very big error and won't affect the fit much.

Either you could use the numpy.where() to identify which xy pairs meet your plotting criteria, or perhaps enumerate to do pretty much the same thing. Example:
x_list = [ 1, 2, 3, 4, 5, 6 ]
y_list = ['f','o','o','b','a','r']
result = [y_list[i] for i, x in enumerate(x_list) if 2 <= x < 5]
print result
I'm sure you could change the conditions so that '2' and '5' in the above example are the functions of your curves

Graphing a line and scatter points using Matplotlib?

I'm using matplotlib at the moment to try and visualise some data I am working on. I'm trying to plot around 6500 points and the line y = x on the same graph but am having some trouble in doing so. I can only seem to get the points to render and not the line itself. I know matplotlib doesn't plot equations as such rather just a set of points so I'm trying to use and identical set of points for x and y co-ordinates to produce the line.
The following is my code
from matplotlib import pyplot
import numpy
from pymongo import *
class Store(object):
"""docstring for Store"""
def __init__(self):
super(Store, self).__init__()
c = Connection()
ucd = c.ucd
self.tweets = ucd.tweets
def fetch(self):
x = []
y = []
for t in self.tweets.find():
x.append(t['positive'])
y.append(t['negative'])
return [x,y]
if __name__ == '__main__':
c = Store()
array = c.fetch()
t = numpy.arange(0., 0.03, 1)
pyplot.plot(array[0], array[1], 'ro', t, t, 'b--')
pyplot.show()
Any suggestions would be appreciated,
Patrick

Correct me if I'm wrong (I'm not a pro at matplotlib), but 't' will simply get the value [0.].
t = numpy.arange(0.,0.03,1)
That means start at 0 and go to 0.03 (not inclusive) with a step size of 1. Resulting in an array containing just 0.
In that case you are simply plotting one point. It takes two to make a line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

get bins coordinates with hexbin in matplotlib - python

Related

Finding two linear fits on different domains in the same data

Number of arrowheads on matplotlib streamplot

Transformations in matplotlib

Remove data points below a curve with python

Graphing a line and scatter points using Matplotlib?

Categories

Resources