I read a waveform from an oscilloscope. The waveform is divided into 10 segments as a function of time. I want to plot the complete waveform, one segment above (or under) another, 'with a vertical offset', so to speak. Additionally, a color map is necessary to show the signal intensity. I've only been able to get the following plot:
As you can see, all the curves are superimposed, which is unacceptable. One could add an offset to the y data but this is not how I would like to do it. Surely there is a much neater way of plotting my data? I've tried a few things to solve this issue using pylab but I am not even sure how to proceed and if this is the right way to go.
import readTrc #helps read binary data from an oscilloscope
import matplotlib.pyplot as plt
fName = r"...trc"
datX, datY, m = readTrc.readTrc(fName)
segments = m['SUBARRAY_COUNT'] #number of segments
x, y = [], []
for i in range(segments+1):
A plot with a vertical offset sounds like a frequency trail.
Here's one approach that does just adjust the y value.
Frequency Trail in MatPlotLib
The same plot has also been coined a joyplot/ridgeline plot. Seaborn has an implementation that creates a series of plots (FacetGrid), and then adjusts the offset between them for a similar effect.
An example using a line plot might look like:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
segments = 10
points_per_segment = 100
#your data preparation will vary
x = np.tile(np.arange(points_per_segment), segments)
z = np.floor(np.arange(points_per_segment * segments)/points_per_segment)
y = np.sin(x * (1 + z))
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
pal = sns.color_palette()
g = sns.FacetGrid(df, row="z", hue="z", aspect=15, height=.5, palette=pal)
g.map(plt.plot, 'x', 'y')
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Set the subplots to overlap
g.despine(bottom=True, left=True)
Is there a way to have a third variable control the color gradient on a log-scaled plot? Also: how would I make a color legend for it? I want it to look something like the image linked below.
#creating arrays
sulfate = np.array(master['SO4-2_(input)'])
chloride = np.array(master['Cl-_(input)'])
pH = np.array(master['pH'])
#create plot
fig, ax = plt.subplots()
#add 1:1 ratio line
plt.plot( [0,1],[0,1] )
#x and y axes lims
When I try to use the technique for a typical scatter plot is says that the variable is not a valid value for color.
As suggested in JohanC's comment, use the scatter function and then set the axis scales to logarithmic separately. To get a colorbar, use colorbar. If you also want the colorbar to have logarithmic scaling (I am not sure if that is what you want), use the norm argument of scatter and provide a matplotlib.colors.LogNorm.
from matplotlib.colors import LogNorm
import matplotlib.pyplot as plt
import numpy as np
# Create come mock data
sulfate = np.random.rand(20)
chloride = np.random.rand(20)
pH = np.arange(20) + 1
# Create the plot
plt.scatter(sulfate, chloride, c=pH, norm=LogNorm(), cmap="cividis")
Depending on what data format your original variable master is in, there might be easier ways to produce this plot. For example, with xarray:
import xarray as xr
ds = xr.Dataset(
data_vars={"sulfate": ("x", sulfate), "chloride": ("x", chloride), "pH": ("x", pH)}
Or with pandas:
df = ds.to_dataframe()
ax = df.plot.scatter(
I have plotted a seaborn scatter plot. My data consists of 5000 data points. By looking into the plot, I definitely am not seeing 5000 points. So I'm pretty sure some kind of sampling is performed by seaborn scatterplot function. I want to know how many data points each point in the plot represent? If it depends on the code, the code is as following:
g = sns.scatterplot(x=data['x'], y=data['y'],hue=data['P'], s=40, edgecolor='k', alpha=0.8, legend="full")
Nothing would really suggest to me that seaborn is sampling your data. However, you can check the data in your axes g to be sure. Query the children of the axes for a PathCollection (scatter plot) object:
It's probably the first item in the list that is returned. From there you can use get_offsets to retrieve the data and check its shape.
As far as I know, no sampling is performed. On the picture you have posted, you can see that most of the data points are just overlapping and that might be the reason why you can not see 5000 points. Try with less points and you will see that all of them get plotted.
In order to check whether or not Seaborn's scatter removes points, here is a way to see 5000 different points. No points seem to be missing.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.linspace(1, 100, 100)
y = np.linspace(1, 50, 50)
X, Y = np.meshgrid(x, y)
Z = (X * Y) % 25
X = np.ravel(X)
Y = np.ravel(Y)
Z = np.ravel(Z)
sns.scatterplot(x=X, y=Y, s=15, hue=Z, palette=plt.cm.plasma, legend=False)
I have a simple pandas dataframe that I want to plot with matplotlib:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('SAT_data.xlsx', index_col = 'State')
plt.scatter(df['Year'], df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'], df['Writing'], c = 'red', s = 25)
Here is what my plot looks like:
I'd like to shift the blue data points a bit to the left, and the red ones a bit to the right, so each year on the x-axis has three mini-columns of scatter data above it instead of all three datasets overlapping. I tried and failed to use the 'verts' argument properly. Is there a better way to do this?
Using an offset transform would allow to shift the scatter points by some amount in units of points instead of data units. The advantage is that they would then always sit tight against each other, independent of the figure size, zoom level etc.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import matplotlib.transforms as transforms
year = np.random.choice(np.arange(2006,2017), size=(300) )
values = np.random.rand(300, 3)
offset = lambda p: transforms.ScaledTranslation(p/72.,0, plt.gcf().dpi_scale_trans)
trans = plt.gca().transData
sc1 = plt.scatter(year, values[:,0], c = 'blue', s = 25, transform=trans+offset(-5))
plt.scatter(year, values[:,1], c = 'orange', s = 25)
plt.scatter(year, values[:,2], c = 'red', s = 25, transform=trans+offset(5))
Broad figure:
Normal figure:
Some explanation:
The problem is that we want to add an offset in points to some data in data coordinates. While data coordinates are automatically transformed to display coordinates using the transData (which we normally don't even see on the surface), adding some offset requires us to change the transform.
We do this by adding an offset. While we could just add an offset in pixels (display coordinates), it is more convenient to add the offset in points and thereby using the same unit as the size of the scatter points is given in (their size is points squared actually).
So we want to know how many pixels are p points? This is found out by dividing p by the ppi (points per inch) to obtain inches, and then by multiplying by the dpi (dots per inch) to obtain the display pixel. This calculation in done in the ScaledTranslation.
While the dots per inch are in principle variable (and taken care of by the dpi_scale_trans transform), the points per inch are fixed. Matplotlib uses 72 ppi, which is kind of a typesetting standard.
A quick and dirty way would be to create a small offset dx and subtract it from x values of blue points and add to x values of red points.
dx = 0.1
plt.scatter(df['Year'] - dx, df['Reading'], c = 'blue', s = 25)
plt.scatter(df['Year'], df['Math'], c = 'orange', s = 25)
plt.scatter(df['Year'] + dx, df['Writing'], c = 'red', s = 25)
One more option could be to use stripplot function from seaborn library. It would be necessary to melt the original dataframe into long form so that each row contains a year, a test and a score. Then make a stripplot specifying year as x, score as y and test as hue. The split keyword argument is what controls plotting categories as separate stripes for each x. There's also the jitter argument that will add some noise to x values so that they take up some small area instead of being on a single vertical line.
import pandas as pd
import seaborn as sns
# make up example data
df = pd.DataFrame(columns = ['Reading','Math','Writing'],
data = np.random.normal(540,30,size=(1000,3)))
df['Year'] = np.random.choice(np.arange(2006,2016),size=1000)
# melt the data into long form
df1 = pd.melt(df, var_name='Test', value_name='Score',id_vars=['Year'])
# make a stripplot
fig, ax = plt.subplots(figsize=(10,7))
sns.stripplot(data = df1, x='Year', y = 'Score', hue = 'Test',
jitter = True, split = True, alpha = 0.7,
palette = ['blue','orange','red'])
Here is how the given code can be adapted to work with multiple subplots, and also to a situation without "middle column".
To adapt the given code, ax[n,p].transData is needed instead of plt.gca().transData. plt.gca() refers to the last created subplot, while now you'll need the transform of each individual subplot.
Another problem is that when only plotting via a transform, matplotlib doesn't automatically sets the lower and upper limits of the subplot. In the given example plots the points "in the middle" without setting a specific transform, and the plot gets "zoomed out" around these points (orange in the example).
If you don't have points at the center, the limits need to be set in another way. The way I came up with, is plotting some dummy points in the middle (which sets the zooming limits), and remove those again.
Also note that the size of the scatter dots in given as the square of their diameter (measured in "unit points"). To have touching dots, you'd need to use the square root for their offset.
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
# Set up data for reproducible example
year = np.random.choice(np.arange(2006, 2017), size=(100))
data = np.random.rand(4, 100, 3)
data2 = np.random.rand(4, 100, 3)
# Create plot and set up subplot ax loop
fig, axs = plt.subplots(2, 2, figsize=(18, 14))
# Set up offset with transform
offset = lambda p: transforms.ScaledTranslation(p / 72., 0, plt.gcf().dpi_scale_trans)
# Plot data in a loop
for ax, q, r in zip(axs.flat, data, data2):
temp_points = ax.plot(year, q, ls=' ')
for pnt in temp_points:
ax.plot(year, q, marker='.', ls=' ', ms=10, c='b', transform=ax.transData + offset(-np.sqrt(10)))
ax.plot(year, r, marker='.', ls=' ', ms=10, c='g', transform=ax.transData + offset(+np.sqrt(10)))
I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.set(xlabel="x data", ylabel="y data")
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.tick_params('y', colors='r')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.tick_params('x', colors='r')
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:
I am still getting my feet with python, so apologies if this is a very simple question.
I have an output file which contains 5 columns, as follows:
Depth Data#1 Data#2 Data#3 Standard_deviation
These columns contain 500 values, if this makes any difference.
What I am trying to do is simply plot data#1, data#2, and data#3 (on the x axis) against depth (on the y axis). I would like data#1 to be blue, and data#2 and data#3 to each be red.
The figsize I would like is (14,6).
I don't want the column containing standard deviation to be plotted here. If it is simpler, I can simply remove that column from the output.
Thanks in advance for any help!
With nearly everything with matplotlib, the way I go about it if i don't know how to do it already, is to just scan through the Gallery to find something that looks similar to what i want to do, and then alter the code there already.
This one has most of what you want in it:
This shows an example of the "fivethirtyeight" styling, which
tries to replicate the styles from FiveThirtyEight.com.
from matplotlib import pyplot as plt
import numpy as np
x = np.linspace(0, 10)
with plt.style.context('fivethirtyeight'):
plt.plot(x, np.sin(x) + x + np.random.randn(50))
plt.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50))
plt.plot(x, np.sin(x) + 2 * x + np.random.randn(50))
It does unfortunately have a load of extra stuff in it you don't want, but the part you should pick up on is that plt.plot(...) can just be called multiple times to plot multiple lines.
Then it's just a case of applying this;
from matplotlib import pyplot
#Make some data
depth = range(500)
allData = zip(*[[x, 2*x, 3*x] for x in depth])
#Set out colours
colours = ["blue", "red", "red"]
for data, colour in zip(allData, colours):
pyplot.plot(depth, data, color=colour)
its matplotlibs basics:
import pylab as pl
data = pl.loadtxt("myfile.txt")
pl.plot(data[:,1], data[:,0], "b")
pl.plot(data[:,2], data[:,0], "r")
pl.plot(data[:,3], data[:,0], "r")
As the question only regard plotting I am assuming you know how to read the data from the file. As for the plotting what you need is the following:
import matplotlib.pyplot as plt
#Create a figure with a certain size
plt.figure(figsize = (14, 6))
#Plot x versus y
plt.plot(data1, depth, color = "blue")
plt.plot(data2, depth, color = "red")
plt.plot(data3, depth, color = "red")
#Save the figure
plt.savefig("figure.png", dpi = 300, bbox_inches = "tight")
#Show the figure
The option bbox_inches = "tight" in savefig results in removing all the excess white boundaries of the figure.