Check if seaborn scatterplot function is sampling data

Check if seaborn scatterplot function is sampling data - python

I have plotted a seaborn scatter plot. My data consists of 5000 data points. By looking into the plot, I definitely am not seeing 5000 points. So I'm pretty sure some kind of sampling is performed by seaborn scatterplot function. I want to know how many data points each point in the plot represent? If it depends on the code, the code is as following:
g = sns.scatterplot(x=data['x'], y=data['y'],hue=data['P'], s=40, edgecolor='k', alpha=0.8, legend="full")

Nothing would really suggest to me that seaborn is sampling your data. However, you can check the data in your axes g to be sure. Query the children of the axes for a PathCollection (scatter plot) object:
g.get_children()
It's probably the first item in the list that is returned. From there you can use get_offsets to retrieve the data and check its shape.
g.get_children()[0].get_offsets().shape

As far as I know, no sampling is performed. On the picture you have posted, you can see that most of the data points are just overlapping and that might be the reason why you can not see 5000 points. Try with less points and you will see that all of them get plotted.

In order to check whether or not Seaborn's scatter removes points, here is a way to see 5000 different points. No points seem to be missing.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.linspace(1, 100, 100)
y = np.linspace(1, 50, 50)
X, Y = np.meshgrid(x, y)
Z = (X * Y) % 25
X = np.ravel(X)
Y = np.ravel(Y)
Z = np.ravel(Z)
sns.scatterplot(x=X, y=Y, s=15, hue=Z, palette=plt.cm.plasma, legend=False)
plt.show()

Related

Linearly scale axes from kilometers to meters for all plots in matplotlib

I am working with data in meters and want to plot positions. Having the ticks in meters is obscuring the readability of the plots, so I want to plot the data in kilometers. I know that it is possible to scale all data d/1000., however, this makes the code less readable in my eyes, especially if you're plotting many different lines, where you have to do this transformation every time.
I am looking for a general way to achieve this type of transformation, I could imagine there is a beautiful way to achieve this in matplotlib.
Some sample code for you:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(7500, 30000, 300)
y_ref = np.linspace(5000, 15000, 300)
y_noised = y_ref + np.random.normal(0, 250, size=y_ref.size)
fig = plt.Figure(figsize=(6,6))
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, y_ref, c='r')
ax.scatter(x, y_noised, alpha=0.2)
fig
I would like to have the following figure, without needing to scale x, y_ref and y_noised individually by 1000.
Is there a way to perform this transformation in matplotlib, such that you only need to do it for each figure once, no matter how many lines you plot?

You could use a custom tick formatter like that (passing a function into set_major_formatter creates a FuncFormatter):
m2km = lambda x, _: f'{x/1000:g}'
ax.xaxis.set_major_formatter(m2km)
ax.yaxis.set_major_formatter(m2km)

How to plot several curves with an offset on the same graph

I read a waveform from an oscilloscope. The waveform is divided into 10 segments as a function of time. I want to plot the complete waveform, one segment above (or under) another, 'with a vertical offset', so to speak. Additionally, a color map is necessary to show the signal intensity. I've only been able to get the following plot:
As you can see, all the curves are superimposed, which is unacceptable. One could add an offset to the y data but this is not how I would like to do it. Surely there is a much neater way of plotting my data? I've tried a few things to solve this issue using pylab but I am not even sure how to proceed and if this is the right way to go.
Any help will be appreciated.
import readTrc #helps read binary data from an oscilloscope
import matplotlib.pyplot as plt
fName = r"...trc"
datX, datY, m = readTrc.readTrc(fName)
segments = m['SUBARRAY_COUNT'] #number of segments
x, y = [], []
for i in range(segments+1):
x.append(datX[segments*i:segments*(i+1)])
y.append(datY[segments*i:segments*(i+1)])
plt.plot(x,y)
plt.show()

A plot with a vertical offset sounds like a frequency trail.
Here's one approach that does just adjust the y value.
Frequency Trail in MatPlotLib
The same plot has also been coined a joyplot/ridgeline plot. Seaborn has an implementation that creates a series of plots (FacetGrid), and then adjusts the offset between them for a similar effect.
https://seaborn.pydata.org/examples/kde_joyplot.html
An example using a line plot might look like:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
segments = 10
points_per_segment = 100
#your data preparation will vary
x = np.tile(np.arange(points_per_segment), segments)
z = np.floor(np.arange(points_per_segment * segments)/points_per_segment)
y = np.sin(x * (1 + z))
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
pal = sns.color_palette()
g = sns.FacetGrid(df, row="z", hue="z", aspect=15, height=.5, palette=pal)
g.map(plt.plot, 'x', 'y')
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.00)
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
Out:

Plot Markers on Curve where Value of X is known in matplotlib

I plotted a curve w.r.t time-series from the data which I got from an experiment. Data is collected at 10ms interval. Data is single row array.
I also have calculated an array which contains the time at which a certain device is triggered. I drew axvlines of these triggered locations.
Now I want to show markers where my curve crosses these axvlines. How can I do it?
Time of trigger (X- is known). Curve is drawn but don't have any equation (irregular experiment data). Trigger interval is also not always the same.
Thanks.
p.s - I also use multiple parasite axes on figure too. Not that it really matters but just in case.
Want Markers On Curve Where AXVline Crosses

You can use numpy.interp() to interpolate the data.
import numpy as np
import matplotlib.pyplot as plt
trig = np.array([0.4,1.3,2.1])
time = np.linspace(0,3,9)
signal = np.sin(time)+1.3
fig, ax = plt.subplots()
ax.plot(time, signal)
for x in trig:
ax.axvline(x, color="limegreen")
#interpolate:
y = np.interp(trig, time, signal)
ax.plot(trig, y, ls="", marker="*", ms=15, color="crimson")
plt.show()

Scale colormap for contour and contourf

I'm trying to plot the contour map of a given function f(x,y), but since the functions output scales really fast, I'm losing a lot of information for lower values of x and y. I found on the forums to work that out using vmax=vmax, it actually worked, but only when plotted for a specific limit of x and y and levels of the colormap.
Say I have this plot:
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
u = np.linspace(-2,2,1000)
x,y = np.meshgrid(u,u)
z = (1-x)**2+100*(y-x**2)**2
cont = plt.contour(x,y,z,500,colors='black',linewidths=.3)
cont = plt.contourf(x,y,z,500,cmap="jet",vmax=100)
plt.colorbar(cont)
plt.show
I want to uncover whats beyond the axis limits keeping the same scale, but if I change de x and y limits to -3 and 3 I get:
See how I lost most of my levels since my max value for the function at these limits are much higher. A work around to this problem is to increase the levels to 1000, but that takes a lot of computational time.
Is there a way to plot only the contour levels that I need? That is, between 0 and 100.
An example of a desired output would be:
With the white space being the continuation of the plot without resizing the levels.
The code I'm using is the one given after the first image.

There are a few possible ideas here. The one I very much prefer is a logarithmic representation of the data. An example would be
from matplotlib import ticker
fig = plt.figure(1)
cont1 = plt.contourf(x,y,z,cmap="jet",locator=ticker.LogLocator(numticks=10))
plt.colorbar(cont1)
plt.show()
fig = plt.figure(2)
cont2 = plt.contourf(x,y,np.log10(z),100,cmap="jet")
plt.colorbar(cont2)
plt.show()
The first example uses matplotlibs LogLocator functions. The second one just directly computes the logarithm of the data and plots that normally.
The third example just caps all data above 100.
fig = plt.figure(3)
zcapped = z.copy()
zcapped[zcapped>100]=100
cont3 = plt.contourf(x,y,zcapped,100,cmap="jet")
cbar = plt.colorbar(cont3)
plt.show()

Plotting mplot3d / axes3D xyz surface plot with log scale?

I've been looking high and low for a solution to this simple problem but I can't find it anywhere! There are a loads of posts detailing semilog / loglog plotting of data in 2D e.g. plt.setxscale('log') however I'm interested in using log scales on a 3d plot(mplot3d).
I don't have the exact code to hand and so can't post it here, however the simple example below should be enough to explain the situation. I'm currently using Matplotlib 0.99.1 but should shortly be updating to 1.0.0 - I know I'll have to update my code for the mplot3d implementation.
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FixedLocator, FormatStrFormatter
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-5, 5, 0.025)
Y = np.arange(-5, 5, 0.025)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, extend3d=True)
ax.set_zlim3d(-1.01, 1.01)
ax.w_zaxis.set_major_locator(LinearLocator(10))
ax.w_zaxis.set_major_formatter(FormatStrFormatter('%.03f'))
fig.colorbar(surf)
plt.show()
The above code will plot fine in 3D, however the three scales (X, Y, Z) are all linear. My 'Y' data spans several orders of magnitude (like 9!), so it would be very useful to plot it on a log scale. I can work around this by taking the log of the 'Y', recreating the numpy array and plotting the log(Y) on a linear scale, but in true python style I'm looking for smarter solution which will plot the data on a log scale.
Is it possible to produce a 3D surface plot of my XYZ data using log scales, ideally I'd like X & Z on linear scales and Y on a log scale?
Any help would be greatly appreciated. Please forgive any obvious mistakes in the above example, as mentioned I don't have my exact code to have and so have altered a matplotlib gallery example from my memory.
Thanks

Since I encountered the same question and Alejandros answer did not produced the desired Results here is what I found out so far.
The log scaling for Axes in 3D is an ongoing issue in matplotlib. Currently you can only relabel the axes with:
ax.yaxis.set_scale('log')
This will however not cause the axes to be scaled logarithmic but labeled logarithmic.
ax.set_yscale('log') will cause an exception in 3D
See on github issue 209
Therefore you still have to recreate the numpy array

I came up with a nice and easy solution taking inspiration from Issue 209. You define a small formatter function in which you set your own notation.
import matplotlib.ticker as mticker
# My axis should display 10⁻¹ but you can switch to e-notation 1.00e+01
def log_tick_formatter(val, pos=None):
return f"$10^{{{int(val)}}}$" # remove int() if you don't use MaxNLocator
# return f"{10**val:.2e}" # e-Notation
ax.zaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
ax.zaxis.set_major_locator(mticker.MaxNLocator(integer=True))
set_major_locator sets the exponential to only use integers 10⁻¹, 10⁻² without 10^-1.5 etc. Source
Important! remove the cast int() in the return statement if you don't use set_major_locator and you want to display 10^-1.5 otherwise it will still print 10⁻¹ instead of 10^-1.5.
Example:
Try it yourself!
from mpl_toolkits.mplot3d import axes3d
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
fig = plt.figure(figsize=(11,8))
ax1 = fig.add_subplot(121,projection="3d")
# Grab some test data.
X, Y, Z = axes3d.get_test_data(0.05)
# Now Z has a range from 10⁻³ until 10³, so 6 magnitudes
Z = (np.full((120, 120), 10)) ** (Z / 20)
ax1.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
ax1.set(title="Linear z-axis (small values not visible)")
def log_tick_formatter(val, pos=None):
return f"$10^{{{int(val)}}}$"
ax2 = fig.add_subplot(122,projection="3d")
# You still have to take log10(Z) but thats just one operation
ax2.plot_wireframe(X, Y, np.log10(Z), rstride=10, cstride=10)
ax2.zaxis.set_major_formatter(mticker.FuncFormatter(log_tick_formatter))
ax2.zaxis.set_major_locator(mticker.MaxNLocator(integer=True))
ax2.set(title="Logarithmic z-axis (much better)")
plt.savefig("LinearLog.png", bbox_inches='tight')
plt.show()

in osx: ran ax.zaxis._set_scale('log') (notice the underscore)

There is no solution because of the issue 209. However, you can try doing this:
ax.plot_surface(X, np.log10(Y), Z, cmap='jet', linewidth=0.5)
If in "Y" there is a 0, it is going to appear a warning but still works. Because of this warning color maps don´t work, so try to avoid 0 and negative numbers. For example:
Y[Y != 0] = np.log10(Y[Y != 0])
ax.plot_surface(X, Y, Z, cmap='jet', linewidth=0.5)

I wanted a symlog plot and, since I fill the data array by hand, I just made a custom function to calculate the log to avoid having negative bars in the bar3d if the data is < 1:
import math as math
def manual_log(data):
if data < 10: # Linear scaling up to 1
return data/10
else: # Log scale above 1
return math.log10(data)
Since I have no negative values, I did not implement handling this values in this function, but it should not be hard to change it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Check if seaborn scatterplot function is sampling data - python

As far as I know, no sampling is performed. On the picture you have posted, you can see that most of the data points are just overlapping and that might be the reason why you can not see 5000 points. Try with less points and you will see that all of them get plotted.

Related

Linearly scale axes from kilometers to meters for all plots in matplotlib

How to plot several curves with an offset on the same graph

Plot Markers on Curve where Value of X is known in matplotlib

Scale colormap for contour and contourf

Plotting mplot3d / axes3D xyz surface plot with log scale?

Categories

Resources