I have two data sets, which I'd like to scatter plot next to each other with error bars. Below is my code to plot one data set with error bars. And also the code to generate the second data set. I'd like the points and errors for each data for each value to be adjacent.
I'd also like to remove the line connecting the dots.
import random
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as ss
data = []
n = 100
m = 10
for i in xrange(m):
d = []
for j in xrange(n):
d.append(random.random())
data.append(d)
mean_data = []
std_data = []
for i in xrange(m):
mean = np.mean(data[i])
mean_data.append(mean)
std = np.std(data[i])
std_data.append(std)
df_data = [n] * m
plt.errorbar(range(m), mean_data, yerr=ss.t.ppf(0.95, df_data)*std_data)
plt.scatter(range(m), mean_data)
plt.show()
new_data = []
for i in xrange(m):
d = []
for j in xrange(n):
d.append(random.random())
new_data.append(d)
mean_new_data = []
std_new_data = []
for i in xrange(m):
mean = np.mean(new_data[i])
mean_new_data.append(mean)
std = np.std(new_data[i])
std_new_data.append(std)
df_new_data = [n] * m
To remove the line in the scatter plot use the fmt argument in plt.errorbar(). The plt.scatter() call is then no longer needed. To plot a second set of data, simply call plt.errorbar() a second time, with the new data.
If you don't want the datasets to overlap, you can add some small random scatter in x to the new dataset. You can do this in two ways, add a single scatter float with
random.uniform(-x_scatter, x_scatter)
which will move all the points as one:
or generate a random scatter float for each point with
x_scatter = np.random.uniform(-.5, .5, m)
which generates something like
To plot both datasets (using the second method), you can use:
plt.errorbar(
range(m), mean_data, yerr=ss.t.ppf(0.95, df_data)*std_data, fmt='o',
label="Data")
# Add some some random scatter in x
x_scatter = np.random.uniform(-.5, .5, m)
plt.errorbar(
np.arange(m) + x_scatter, mean_new_data,
yerr=ss.t.ppf(0.95, df_new_data)*std_new_data, fmt='o', label="New data")
plt.legend()
plt.show()
Related
I am brand new to coding, please be gentle!
I have a main script that runs a simulation of particles colliding with each other and walls in micro-gravity conditions. This part of the script outputs individual data files containing: timestep, vtotal. There are 15 particles so I get out 15 txt files.
N_max = sim.getNumTimeSteps()
particleData = [ [] for x in range(len(sim.getParticleList()))]
for n in range (N_max):
sim.runTimeStep()
if (n%1000==0):
particles = sim.getParticleList()
for i in range(len(sim.getParticleList())):
print i
x, y, z = particles[i].getVelocity()
particleData[i].append( (n, x, y, z ))
print len(sim.getParticleList())
for i in range(len(sim.getParticleList())):
with open("{0:d}.dat".format(i), "w") as f:
for j in particleData[i]:
f.write("%f,%f \n" % (j[0], (math.sqrt(float(j[1])**2+float(j[2])**2+float(j[3])**2)) ))
sim.exit()
The end result I need to work toward is a graph of the mean of those 15 particles over time. For example, in this simulation it was running for 22000 timesteps, at increments of 1000. Correct me if I am wrong, but the mean should be (vtotal1+vtotal2+vtotal3+...vtotal15)/per increment. When that is plotted over time, a single line represents the mean velocity of the 15 particles from the simulation? Here is a version of what I was doing that was adapted from another averaging attempt.
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import csv
import math
import numpy as np
x = []
y = []
y_mean = np.array([1 for _ in range(22000/1000)])
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(10, 7))
for i in range(15):
x = []
y = []
with open("{}.dat".format(i),'r') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
x.append(float(row[0]))
y.append(float(row[1]))
y_mean[int(float(row[0]) / 1000)] += y[-1]
axes.plot(x,y, color='skyblue', label="Total v {}".format(i+1))
axes.plot(x,y_mean, color='olive', label="Average v {}".format(i+1))
plt.title('Particles Over Time')
plt.xlabel('Timestep Number')
plt.grid(alpha=.5,linestyle='--')
plt.ylabel('Velocity')
plt.xlim(0, 2000)
plt.show()
plt.autoscale(enable=True, axis=y, tight=True)
plt.legend()
plt.savefig("round2avgs.png")
y_mean = np.asarray(y) / 15
I just don't know what's going wrong. Any assistance is appreciated.
Normally, you should split your data processing and visualization into two different steps.
Say you have a 5 CSVs, all having the same data:
0,1
1000,2
2000,3
3000,4
4000,5
Let's name this 1.dat, 2.dat ... 3.dat.
Import the libraries and load the data
import csv
import matplotlib.pyplot as plt
import numpy as np
x = []
ys = []
for i in range(5):
with open(f'{i+1}.dat') as data_file:
data = csv.reader(data_file, delimiter=',')
y = []
for row in data:
if i == 0:
x.append(float(row[0]))
y.append(float(row[1]))
ys.append(y)
Calculate the means per timestep using numpy
means_per_timestep = np.array(ys).mean(axis=0)
Plot it
plt.plot(x, means_per_timestep)
Is this what you were expecting?
I'm very new to python but am interested in learning a new technique whereby I can identify different data points in a scatter plot with different markers according to where they fall in the scatter plot.
My specific example is much to this: http://www.astroml.org/examples/datasets/plot_sdss_line_ratios.html
I have a BPT plot and want to split the data along the demarcation line line.
I have a data set in this format:
data = [[a,b,c],
[a,b,c],
[a,b,c]
]
And I also have the following for the demarcation line:
NII = np.linspace(-3.0, 0.35)
def log_OIII_Hb_NII(log_NII_Ha, eps=0):
return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)
Any help would be great!
There was not enough room in the comments section. Not too dissimilar to what #DrV wrote, but maybe more astronomically inclined:
import random
import numpy as np
import matplotlib.pyplot as plt
def log_OIII_Hb_NII(log_NII_Ha, eps=0):
return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)
# Make some fake measured NII_Ha data
iternum = 100
# Ranged -2.1 to 0.4:
Measured_NII_Ha = np.array([random.random()*2.5-2.1 for i in range(iternum)])
# Ranged -1.5 to 1.5:
Measured_OIII_Hb = np.array([random.random()*3-1.5 for i in range(iternum)])
# For our measured x-value, what is our cut-off value
Measured_Predicted_OIII_Hb = log_OIII_Hb_NII(Measured_NII_Ha)
# Now compare the cut-off line to the measured emission line fluxes
# by using numpy True/False arrays
#
# i.e., x = numpy.array([1,2,3,4])
# >> index = x >= 3
# >> print(index)
# >> numpy.array([False, False, True, True])
# >> print(x[index])
# >> numpy.array([3,4])
Above_Predicted_Red_Index = Measured_OIII_Hb > Measured_Predicted_OIII_Hb
Below_Predicted_Blue_Index = Measured_OIII_Hb < Measured_Predicted_OIII_Hb
# Alternatively, you can invert Above_Predicted_Red_Index
# Make the cut-off line for a range of values for plotting it as
# a continuous line
Predicted_NII_Ha = np.linspace(-3.0, 0.35)
Predicted_log_OIII_Hb_NII = log_OIII_Hb_NII(Predicted_NII_Ha)
fig = plt.figure(0)
ax = fig.add_subplot(111)
# Plot the modelled cut-off line
ax.plot(Predicted_NII_Ha, Predicted_log_OIII_Hb_NII, color="black", lw=2)
# Plot the data for a given colour
ax.errorbar(Measured_NII_Ha[Above_Predicted_Red_Index], Measured_OIII_Hb[Above_Predicted_Red_Index], fmt="o", color="red")
ax.errorbar(Measured_NII_Ha[Below_Predicted_Blue_Index], Measured_OIII_Hb[Below_Predicted_Blue_Index], fmt="o", color="blue")
# Make it aesthetically pleasing
ax.set_ylabel(r"$\rm \log([OIII]/H\beta)$")
ax.set_xlabel(r"$\rm \log([NII]/H\alpha)$")
plt.show()
I assume you have the pixel coordinates as a, b in your example. The column with cs is then something that is used to calculate whether a point belongs to one of the two groups.
Make your data first an ndarray:
import numpy as np
data = np.array(data)
Now you may create two arrays by checking which part of the data belongs to which area:
dataselector = log_OIII_Hb_NII(data[:,2]) > 0
This creates a vector of Trues and Falses which has a True whenever the data in the third column (column 2) gives a positive value from the function. The length of the vector equals to the number of rows in data.
Then you can plot the two data sets:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
# the plotting part
ax.plot(data[dataselector,0], data[dataselector,1], 'ro')
ax.plot(data[-dataselector,0], data[-dataselector,1], 'bo')
I.e.:
create a list of True/False values which tells which rows of data belong to which group
plot the two groups (-dataselector means "all the rows where there is a False in dataselector")
I am trying to make a contour plot of the following data using matplotlib in python. The data is of this form -
# x y height
77.23 22.34 56
77.53 22.87 63
77.37 22.54 72
77.29 22.44 88
The data actually consists of nearly 10,000 points, which I am reading from an input file. However the set of distinct possible values of z is small (within 50-90, integers), and I wish to have a contour lines for every such distinct z.
Here is my code -
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import csv
import sys
# read data from file
data = csv.reader(open(sys.argv[1], 'rb'), delimiter='|', quotechar='"')
x = []
y = []
z = []
for row in data:
try:
x.append(float(row[0]))
y.append(float(row[1]))
z.append(float(row[2]))
except Exception as e:
pass
#print e
X, Y = np.meshgrid(x, y) # (I don't understand why is this required)
# creating a 2D array of z whose leading diagonal elements
# are the z values from the data set and the off-diagonal
# elements are 0, as I don't care about them.
z_2d = []
default = 0
for i, no in enumerate(z):
z_temp = []
for j in xrange(i): z_temp.append(default)
z_temp.append(no)
for j in xrange(i+1, len(x)): z_temp.append(default)
z_2d.append(z_temp)
Z = z_2d
CS = plt.contour(X, Y, Z, list(set(z)))
plt.figure()
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.show()
Here is the plot of a small sample of data -
Here is a close look to one of the regions of the above plot (note the overlapping/intersecting lines) -
I don't understand why it doesn't look like a contour plot. The lines are intersecting, which shouldn't happen. What can be possibly wrong? Please help.
Try to use the following code. This might help you -- it's the same thing which was in the Cookbook:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
# with this way you can load your csv-file really easy -- maybe you should change
# the last 'dtype' to 'int', because you said you have int for the last column
data = np.genfromtxt('output.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter='|')
# just an assigning for better look in the plot routines
x = data['x']
y = data['y']
z = data['z']
# just an arbitrary number for grid point
ngrid = 500
# create an array with same difference between the entries
# you could use x.min()/x.max() for creating xi and y.min()/y.max() for yi
xi = np.linspace(-1,1,ngrid)
yi = np.linspace(-1,1,ngrid)
# create the grid data for the contour plot
zi = griddata(x,y,z,xi,yi)
# plot the contour and a scatter plot for checking if everything went right
plt.contour(xi,yi,zi,20,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.show()
I created a sample output file with an Gaussian distribution in 2D. My result with using the code from above:
NOTE:
Maybe you noticed that the edges are kind of cropped. This is due to the fact that the griddata-function create masked arrays. I mean the border of the plot is created by the outer points. Everything outside the border is not there. If your points would be on a line then you will not have any contour for plotting. This is kind of logical. I mention it, cause of your four posted data points. It seems likely that you have this case. Maybe you don't have it =)
UPDATE
I edited the code a bit. Your problem was probably that you didn't resolve the dependencies of your input-file correctly. With the following code the plot should work correctly.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
import csv
data = np.genfromtxt('example.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter=',')
sample_pts = 500
con_levels = 20
x = data['x']
xmin = x.min()
xmax = x.max()
y = data['y']
ymin = y.min()
ymax = y.max()
z = data['z']
xi = np.linspace(xmin,xmax,sample_pts)
yi = np.linspace(ymin,ymax,sample_pts)
zi = griddata(x,y,z,xi,yi)
plt.contour(xi,yi,zi,con_levels,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
plt.show()
With this code and your small sample I get the following plot:
Try to use my snippet and just change it a bit. For example, I had to change for the given sample csv-file the delimitter from | to ,. The code I wrote for you is not really nice, but it's written straight foreword.
Sorry for the late response.
I am plotting a confusion matrix with matplotlib with the following code:
from numpy import *
import matplotlib.pyplot as plt
from pylab import *
conf_arr = [[33,2,0,0,0,0,0,0,0,1,3], [3,31,0,0,0,0,0,0,0,0,0], [0,4,41,0,0,0,0,0,0,0,1], [0,1,0,30,0,6,0,0,0,0,1], [0,0,0,0,38,10,0,0,0,0,0], [0,0,0,3,1,39,0,0,0,0,4], [0,2,2,0,4,1,31,0,0,0,2], [0,1,0,0,0,0,0,36,0,2,0], [0,0,0,0,0,0,1,5,37,5,1], [3,0,0,0,0,0,0,0,0,39,0], [0,0,0,0,0,0,0,0,0,0,38] ]
norm_conf = []
for i in conf_arr:
a = 0
tmp_arr = []
a = sum(i,0)
for j in i:
tmp_arr.append(float(j)/float(a))
norm_conf.append(tmp_arr)
plt.clf()
fig = plt.figure()
ax = fig.add_subplot(111)
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
cb = fig.colorbar(res)
savefig("confmat.png", format="png")
But I want to the confusion matrix to show the numbers on it like this graphic (the right one). How can I plot the conf_arr on the graphic?
You can use text to put arbitrary text in your plot. For example, inserting the following lines into your code will write the numbers (note the first and last lines are from your code to show you where to insert my lines):
res = ax.imshow(array(norm_conf), cmap=cm.jet, interpolation='nearest')
for i, cas in enumerate(conf_arr):
for j, c in enumerate(cas):
if c>0:
plt.text(j-.2, i+.2, c, fontsize=14)
cb = fig.colorbar(res)
The only way I could really see of doing it was to use annotations. Try these lines:
for i,j in ((x,y) for x in xrange(len(conf_arr))
for y in xrange(len(conf_arr[0]))):
ax.annotate(str(conf_arr[i][j]),xy=(i,j))
before saving the figure. It adds the numbers, but I'll let you figure out how to get the sizes of the numbers how you want them.
I have a simple plot with several sets of points and lines connecting each set. I want the points to be plotted on top of the lines (so that the line doesn't show inside the point). Regardless of order of the plot and scatter calls, this plot comes out the same, and not as I'd like. Is there a simple way to do it?
import math
import matplotlib.pyplot as plt
def poisson(m):
def f(k):
e = math.e**(-m)
f = math.factorial(k)
g = m**k
return g*e/f
return f
R = range(20)
L = list()
means = (1,4,10)
for m in means:
f = poisson(m)
L.append([f(k) for k in R])
colors = ['r','b','purple']
for c,P in zip(colors,L):
plt.plot(R,P,color='0.2',lw=1.5)
plt.scatter(R,P,s=150,color=c)
ax = plt.axes()
ax.set_xlim(-0.5,20)
ax.set_ylim(-0.01,0.4)
plt.savefig('example.png')
You need to set the Z-order.
plt.plot(R,P,color='0.2',lw=1.5, zorder=1)
plt.scatter(R,P,s=150,color=c, zorder=2)
Check out this example.
http://matplotlib.sourceforge.net/examples/pylab_examples/zorder_demo.html