How to modify the sampling bins for Windrose plots in python? - python

I am drawing a windrose plot using a library called windrose.
Everything works perfectly, except for one part. Here is an example:
from windrose import WindroseAxes
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import numpy as np
# Create wind speed and direction variables
ws = np.random.random(500) * 6
wd = np.random.random(500) * 360
ax = WindroseAxes.from_ax()
ax.bar(wd, ws, normed=True, opening=0.8, edgecolor='white')
ax.set_legend()
And based on the data that I have in df I get this plot:
However, the desired output on the same data should look like this one:
What is the difference? So, if you look at the first figure, you see that the degree bins do not start from 0 degrees. In fact, if you look at the bar on N, you see that it is halved. However, the second figure starts exactly at 0 degrees, with the first bar starting exactly at N.
Is there any way that I can shift the bins in windrose so that they start exactly at 0 degrees?
EDIT: I updated the code above to be a simple verifiable sample. Note that the figures do not necessarily demonstrate the plot drawn using the code (because of random number generation), however, it can be used as a way to demonstrate what I am looking for.

Related

Polar pcolormesh shifts center when used set_ylim in matplotlib

Although I am providing a excerpt of the code I am using, but this piece contains the problem I am facing. I am trying to plot density of the particles over the disc and hence polar plot seems natural to use. So I have used following piece of code to read a density file which contains the density with rows and column representing radius and angular direction.
#! /usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
from os.path import exists
from os import sys
import matplotlib as mpl
from matplotlib import rc
NUMBINS=100
rmax=20.0
dR2=rmax*rmax/NUMBINS
density = np.random.random((NUMBINS, NUMBINS))
r = np.sqrt(np.arange(0,rmax*rmax,dR2) )[:NUMBINS]
theta = np.linspace(0,2*np.pi,NUMBINS)
mpl.rcParams['legend.fontsize'] = 10
mpl.rcParams['pcolor.shading'] ='nearest'
fig = plt.figure(figsize=(5, 5))
ax1 = plt.subplot(111,projection="polar")
rad, th = np.meshgrid(r,theta)
ax1.set_yticks(np.arange(0,rmax,3))
ax1.pcolormesh(th,rad,density,cmap='Blues')
#ax1.set_ylim([rad[0,0], rad[0,NUMBINS-1]])
plt.tight_layout()
plt.show()
which gives me the following plot :
As you can see that the radius starts from 0 to rmax, removing the commented line
ax1.set_ylim([rad[0,0], rad[0,NUMBINS-1]])
shall not have any effect on the plot but it shifts the center of the plot :
I don't understand why setting ymin=0 creates this white space in the center?
Turns out that it is a problem with version of matplotlib. I tried a different version and the plot works as expected. Apologies for not trying it earlier.

Plot scatter graphs with matplotlib subplot

I am trying to plot a scatter diagram. It will take multiple arrays as input but plot into a single graph.
Here is my code:
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter = ([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter = ([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])
I will read the arrays from csv file, here I just put a simple example (for that I imported os). I want to plot the ratio of array element 2/ element 1 of n_p (as x-axis) and same with n_d (as y-axis). This will give a point in the graph. Similar operation will be followed by a_p and a_d array, and the point will be appended to the graph. There will be more data to append, but to understand the process, two is enough.
I tried to follow example from here.
If I use the color, I get syntax error.
If I do not use color, I get a blank plot.
Sorry, my coding experience is beginner so code is rather nasty.
Thanks in advance.
remove the = from the function call!
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])

How to plot a line graph of density over a density colour map plot in Python

First time user so apologies for any mistakes.
I have some code (pasted below) which is used to analyse and gain values/graphs from a simulation I have run.
This results in the following image:
I would therefore now like to plot a line graph on top of this according to the values of the colour map corresponding to r = 0 on the y-axis at every point on the x - axis with each respective value on the colour map. However, I'm completely lost on where to even begin with this. I've tried looking into KDE and other similar things, but I realise I'm not sure how to take numerical values which were used to generate the colour map.
from openpmd_viewer import OpenPMDTimeSeries
from openpmd_viewer.addons import LpaDiagnostics
import numpy as np
from scipy.constants import c, e, m_e
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Replace the string below, to point to your data
ts = OpenPMDTimeSeries(r"/Users/bentorrance/diags/hdf5/")
ts_2d = LpaDiagnostics(r"/Users/bentorrance/diags/hdf5/")
plt.figure(1)
Ez = ts.get_field(iteration=5750, field='E', coord='z', plot=True, cmap='inferno')
plt.title(r'Electric Field Density $E_{z}$')
plt.show()

Cumulative probability plots in Matplotlib

How would I make a plot of this style in python with matplotlib? (Cumulative probability plot) I don't need complete code, mostly just need a place to start and a general idea of what I need to do for it.
A cumulative probability plot is really easy to make:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
fig,ax = plt.subplots()
ax.plot(np.sort(data),np.linspace(0.0,1.0,len(data)))
plt.xlabel(r'$x$')
plt.ylabel(r'$P(X \leq x)$')
plt.show()
Note that it can have a strong advantage over a probability density plot as it does not require binning of your data. (Should you be looking for the latter you can check this code).

Scatter plot with a huge amount of data

I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.
import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import *
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)
Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?
Unless your graphic is huge, many of those 3 million points are going to overlap.
(A 400x600 image only has 240K dots...)
So the easiest thing to do would be to take a sample of say, 1000 points, from your data:
import random
delta_sample=random.sample(delta,1000)
and just plot that.
For example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)
idx=random.sample(range(N),1000)
plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()
Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.
Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:
xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()
What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.
You could take the heatmap approach shown here. In this example the color represents the quantity of data in the bin, not the median value of the dS array, but that should be easy to change. More later if you are interested.

Categories

Resources