I am trying to plot a scatter diagram. It will take multiple arrays as input but plot into a single graph.
Here is my code:
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter = ([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter = ([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])
I will read the arrays from csv file, here I just put a simple example (for that I imported os). I want to plot the ratio of array element 2/ element 1 of n_p (as x-axis) and same with n_d (as y-axis). This will give a point in the graph. Similar operation will be followed by a_p and a_d array, and the point will be appended to the graph. There will be more data to append, but to understand the process, two is enough.
I tried to follow example from here.
If I use the color, I get syntax error.
If I do not use color, I get a blank plot.
Sorry, my coding experience is beginner so code is rather nasty.
Thanks in advance.
remove the = from the function call!
import numpy as np
import os
import matplotlib.pyplot as plt
ax = plt.gca()
n_p=np.array([17.2,25.7,6.1,0.9,0.5,0.2])
n_d=np.array([1,2,3])
a_p=np.array([4.3,1.4,8.1,1.8,7.9,7.0])
a_d=np.array([12,13,14])
ax.scatter([n_d[0]/n_d[1]],[n_p[0]/n_p[1]])
ax.scatter([a_d[0]/a_d[1]],[a_p[0]/a_p[1]])
Related
First time user so apologies for any mistakes.
I have some code (pasted below) which is used to analyse and gain values/graphs from a simulation I have run.
This results in the following image:
I would therefore now like to plot a line graph on top of this according to the values of the colour map corresponding to r = 0 on the y-axis at every point on the x - axis with each respective value on the colour map. However, I'm completely lost on where to even begin with this. I've tried looking into KDE and other similar things, but I realise I'm not sure how to take numerical values which were used to generate the colour map.
from openpmd_viewer import OpenPMDTimeSeries
from openpmd_viewer.addons import LpaDiagnostics
import numpy as np
from scipy.constants import c, e, m_e
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Replace the string below, to point to your data
ts = OpenPMDTimeSeries(r"/Users/bentorrance/diags/hdf5/")
ts_2d = LpaDiagnostics(r"/Users/bentorrance/diags/hdf5/")
plt.figure(1)
Ez = ts.get_field(iteration=5750, field='E', coord='z', plot=True, cmap='inferno')
plt.title(r'Electric Field Density $E_{z}$')
plt.show()
I have written following code,
import numpy as np
import matplotlib.pyplot as plt
x=np.random.randint(0,10,[1,5])
y=np.random.randint(0,10,[1,5])
x.sort(),y.sort()
fig, ax=plt.subplots(figsize=(10,10))
ax.plot(x,y)
ax.set( title="random data plot", xlabel="x",ylabel="y")
I am getting a blank figure.
Same code prints chart if I manually assign below value to x and y and not use random function.
x=[1,2,3,4]
y=[11,22,33,44]
Am I missing something or doing something wrong.
x=np.random.randint(0,10,[1,5]) returns an array if you specify the shape as [1,5]. Either you would want x=np.random.randint(0,10,[1,5])[0] or x=np.random.randint(0,10,size = 5). See: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randint.html
Matplotlib doesn't plot markers by default, only a line. As per #Can comment, matplotlib then interprets your (1, 5) array as 5 different datasets each with 1 point, so there is no line as there is no second point.
If you add a marker to your plot function then you can see the data is actually being plotted, just probably not as you wish:
import matplotlib.pyplot as plt
import numpy as np
x=np.random.randint(0,10,[1,5])
y=np.random.randint(0,10,[1,5])
x.sort(),y.sort()
fig, ax=plt.subplots(figsize=(10,10))
ax.plot(x,y, marker='.') # <<< marker for each point added here
ax.set( title="random data plot", xlabel="x",ylabel="y")
I have a data set that has two independent variables and 1 dependent variable. I thought the best way to represent the dataset is by a checkerboard-type plot wherein the color of the cells represent a range of values, like this:
I can't seem to find a code to do this automatically.
You need to use a plotting package to do this. For example, with matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = 100*np.random.rand(6,6)
fig, ax = plt.subplots()
i = ax.imshow(X, cmap=cm.jet, interpolation='nearest')
fig.colorbar(i)
plt.show()
For those who come across this years later as myself, what Original Poster wants is a heatmap.
Matplotlib has documentation regarding the following example here.
I have two similar pieces of matplotlib codes that produce different results.
1:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,100)
y = np.linspace(0,10,100)
y[10:40] = np.nan
plt.plot(x,y)
plt.savefig('fig')
2:
from pylab import *
x = linspace(0,10,100)
y = linspace(0,10,100)
y[10:40] = np.nan
plot(x,y)
savefig('fig')
Code #1 produces a straight line with the NaN region filled in with a solid line of a different color
Code #2 produces a figure with a straight line but does not fill in the NaN region with a line. Instead there is a gap there.
How can I make code # 1 produce a gap in place of NaN's like code #2. I have been googling for a couple of days and have come up with nothing. Any help or advice would be appreciated. Thanks in advance
Just to explain what's probably happening:
The two pieces of code you showed are identical. They will always produce the same output if called by themselves. pylab is basically a just a few lines of code that does: (There's a bit more to it than this, but it's the basic idea.)
from numpy import *
from matplotlib.mlab import *
from matplotlib.pyplot import *
There's absolutely no way for pylab.plot to reference a different function than plt.plot
However, if you just call plt.plot (or pylab.plot, they're the same function), it plots on the current figure.
If you plotted something on that figure before, it will still be there. (If you're familiar with matlab, matplotlib defaults to hold('on'). You can change this with plt.hold, but it's best to be more explicit in python and just create a new figure.)
Basically, you probably did this:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,100)
y = np.linspace(0,10,100)
plt.plot(x,y)
plt.savefig('fig')
And then, in the same interactive ipython session, you did this:
y[10:40] = np.nan
plt.plot(x, y)
plt.savefig('fig')
Because you didn't call show, the current figure is still the same one as it was before. The "full" line is still present beneath the second one, and the second line with the NaN's is a different color because you've plotted on the same axes.
This is one of the many reasons why it's a good idea to use the object-oriented interface. That way you're aware of exactly which axes and figure you're plotting on.
For example:
fig, ax = plt.subplots()
ax.plot(x, y)
fig.savefig('test.png')
If you're not going to do that, at very least always explicitly create a new figure and/or axes when you want a new figure. (e.g. start by calling plt.figure())
I would like to use Matplotlib to generate a scatter plot with a huge amount of data (about 3 million points). Actually I've 3 vectors with the same dimension and I use to plot in the following way.
import matplotlib.pyplot as plt
import numpy as np
from numpy import *
from matplotlib import rc
import pylab
from pylab import *
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
plt.scatter(delta,vf,c=dS,alpha=0.7,cmap=cm.Paired)
Nothing special actually. But it takes too long to generate it actually (I'm working on my MacBook Pro 4 GB RAM with Python 2.7 and Matplotlib 1.0). Is there any way to improve the speed?
Unless your graphic is huge, many of those 3 million points are going to overlap.
(A 400x600 image only has 240K dots...)
So the easiest thing to do would be to take a sample of say, 1000 points, from your data:
import random
delta_sample=random.sample(delta,1000)
and just plot that.
For example:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import random
fig = plt.figure()
fig.subplots_adjust(bottom=0.2)
ax = fig.add_subplot(111)
N=3*10**6
delta=np.random.normal(size=N)
vf=np.random.normal(size=N)
dS=np.random.normal(size=N)
idx=random.sample(range(N),1000)
plt.scatter(delta[idx],vf[idx],c=dS[idx],alpha=0.7,cmap=cm.Paired)
plt.show()
Or, if you need to pay more attention to outliers, then perhaps you could bin your data using np.histogram, and then compose a delta_sample which has representatives from each bin.
Unfortunately, when using np.histogram I don't think there is any easy way to associate bins with individual data points. A simple, but approximate solution is to use the location of a point in or on the bin edge itself as a proxy for the points in it:
xedges=np.linspace(-10,10,100)
yedges=np.linspace(-10,10,100)
zedges=np.linspace(-10,10,10)
hist,edges=np.histogramdd((delta,vf,dS), (xedges,yedges,zedges))
xidx,yidx,zidx=np.where(hist>0)
plt.scatter(xedges[xidx],yedges[yidx],c=zedges[zidx],alpha=0.7,cmap=cm.Paired)
plt.show()
What about trying pyplot.hexbin? It generates a sort of heatmap based on point density in a set number of bins.
You could take the heatmap approach shown here. In this example the color represents the quantity of data in the bin, not the median value of the dS array, but that should be easy to change. More later if you are interested.