I am trying to plot a large dataset with a scatter plot.
I want to use matplotlib to plot it with single pixel marker.
It seems to have been solved.
https://github.com/matplotlib/matplotlib/pull/695
But I cannot find a mention of how to get a single pixel marker.
My simplified dataset (data.csv)
Length,Time
78154393,139.324091
84016477,229.159305
84626159,219.727537
102021548,225.222662
106399706,221.022827
107945741,206.760239
109741689,200.153263
126270147,220.102802
207813132,181.67058
610704756,50.59529
623110004,50.533158
653383018,52.993885
659376270,53.536834
680682368,55.97628
717978082,59.043843
My code is below.
import pandas as pd
import os
import numpy
import matplotlib.pyplot as plt
inputfile='data.csv'
iplevel = pd.read_csv(inputfile)
base = os.path.splitext(inputfile)[0]
fig = plt.figure()
plt.yscale('log')
#plt.xscale('log')
plt.title(' My plot: '+base)
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(iplevel['Time'], iplevel['Length'],color='black',marker=',',lw=0,s=1)
fig.tight_layout()
fig.savefig(base+'_plot.png', dpi=fig.dpi)
You can see below that the points are not single pixel.
Any help is appreciated
The problem
I fear that the bugfix discussed at matplotlib git repository that you're citing is only valid for plt.plot() and not for plt.scatter()
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4,2))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122, sharex=ax, sharey=ax)
ax.plot([1, 2],[0.4,0.4],color='black',marker=',',lw=0, linestyle="")
ax.set_title("ax.plot")
ax2.scatter([1,2],[0.4,0.4],color='black',marker=',',lw=0, s=1)
ax2.set_title("ax.scatter")
ax.set_xlim(0,8)
ax.set_ylim(0,1)
fig.tight_layout()
print fig.dpi #prints 80 in my case
fig.savefig('plot.png', dpi=fig.dpi)
The solution: Setting the markersize
The solution is to use a usual "o" or "s" marker, but set the markersize to be exactly one pixel. Since the markersize is given in points, one would need to use the figure dpi to calculate the size of one pixel in points. This is 72./fig.dpi.
For aplot`, the markersize is directly
ax.plot(..., marker="o", ms=72./fig.dpi)
For a scatter the markersize is given through the s argument, which is in square points,
ax.scatter(..., marker='o', s=(72./fig.dpi)**2)
Complete example:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(4,2))
ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122, sharex=ax, sharey=ax)
ax.plot([1, 2],[0.4,0.4], marker='o',ms=72./fig.dpi, mew=0,
color='black', linestyle="", lw=0)
ax.set_title("ax.plot")
ax2.scatter([1,2],[0.4,0.4],color='black', marker='o', lw=0, s=(72./fig.dpi)**2)
ax2.set_title("ax.scatter")
ax.set_xlim(0,8)
ax.set_ylim(0,1)
fig.tight_layout()
fig.savefig('plot.png', dpi=fig.dpi)
For anyone still trying to figure this out, the solution I found was to specify the s argument in plt.scatter.
The s argument refers to the area of the point you are plotting.
It doesn't seem to be quite perfect, since s=1 seems to cover about 4 pixels of my screen, but this definitely makes them smaller than anything else I've been able to find.
https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.scatter.html
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
Set the plt.scatter() parameter to linewidths=0 and figure out the right value for the parameter s.
Source: https://stackoverflow.com/a/45803960/4063622
Related
I have a parallel coordinates plot with lots of data points so I'm trying to use a continuous colour bar to represent that, which I think I have worked out. However, I haven't been able to remove the default key that is put in when creating the plot, which is very long and hinders readability. Is there a way to remove this table to make the graph much easier to read?
This is the code I'm currently using to generate the parallel coordinates plot:
parallel_coordinates(data[[' male_le','
female_le','diet','activity','obese_perc','median_income']],'median_income',colormap = 'rainbow',
alpha = 0.5)
fig, ax = plt.subplots(figsize=(6, 1))
fig.subplots_adjust(bottom=0.5)
cmap = mpl.cm.rainbow
bounds = [0.00,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
norm = mpl.colors.BoundaryNorm(bounds, cmap.N,)
plt.colorbar(mpl.cm.ScalarMappable(norm = norm, cmap=cmap),cax = ax, orientation = 'horizontal',
label = 'normalised median income', alpha = 0.5)
plt.show()
Current Output:
I want my legend to be represented as a color bar, like this:
Any help would be greatly appreciated. Thanks.
You can use ax.legend_.remove() to remove the legend.
The cax parameter of plt.colorbar indicates the subplot where to put the colorbar. If you leave it out, matplotlib will create a new subplot, "stealing" space from the current subplot (subplots are often referenced to by ax in matplotlib). So, here leaving out cax (adding ax=ax isn't necessary, as here ax is the current subplot) will create the desired colorbar.
The code below uses seaborn's penguin dataset to create a standalone example.
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import numpy as np
from pandas.plotting import parallel_coordinates
penguins = sns.load_dataset('penguins')
fig, ax = plt.subplots(figsize=(10, 4))
cmap = plt.get_cmap('rainbow')
bounds = np.arange(penguins['body_mass_g'].min(), penguins['body_mass_g'].max() + 200, 200)
norm = mpl.colors.BoundaryNorm(bounds, 256)
penguins = penguins.dropna(subset=['body_mass_g'])
parallel_coordinates(penguins[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']],
'body_mass_g', colormap=cmap, alpha=0.5, ax=ax)
ax.legend_.remove()
plt.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
ax=ax, orientation='horizontal', label='body mass', alpha=0.5)
plt.show()
For some reason when I use a zorder with my scatter plot the edges of the points overlap the axis. I tried some of the solutions from [here] (matplotlib axis tick labels covered by scatterplot (using spines)) but they didn't work for me. Is there a way from preventing this from happening?
I understand I could also add an ax.axvline() at my boundaries but that would be an annoying workaround for lots of plots.
xval = np.array([0,0,0,3,3,3,0,2,3,0])
yval = np.array([0,2,3,5,1,0,1,0,4,5])
zval = yval**2-4
fig = plt.figure(figsize=(6,6))
ax = plt.subplot(111)
ax.scatter(xval,yval,cmap=plt.cm.rainbow,c=zval,s=550,zorder=20)
ax.set_ylim(0,5)
ax.set_xlim(0,3)
#These don't work
ax.tick_params(labelcolor='k', zorder=100)
ax.tick_params(direction='out', length=4, color='k', zorder=100)
#This will work but I don't want to have to do this for the plot edges every time
ax.axvline(0,c='k',zorder=100)
plt.show()
For me the solution you linked to works; that is, setting the z-order of the scatter plot to a negative number. E.g.
xval = np.array([0,0,0,3,3,3,0,2,3,0])
yval = np.array([0,2,3,5,1,0,1,0,4,5])
zval = yval**2-4
fig = plt.figure(figsize=(6,6))
ax = plt.subplot(111)
ax.scatter(xval,yval,cmap=plt.cm.rainbow,c=zval,s=550,zorder=-1)
ax.set_ylim(0,5)
ax.set_xlim(0,3)
plt.show()
]1
You can fix the overlap using the following code with a large number for the zorder. This will work on both the x- and y-axis.
for k,spine in ax.spines.items():
spine.set_zorder(1000)
This works for me
import numpy as np
import matplotlib.pyplot as plt
xval = np.array([0,0,0,3,3,3,0,2,3,0])
yval = np.array([0,2,3,5,1,0,1,0,4,5])
zval = yval**2-4
fig = plt.figure(figsize=(6,6))
ax = plt.subplot(111)
ax.scatter(xval,yval,cmap=plt.cm.rainbow,c=zval,s=550,zorder=20)
ax.set_ylim(-1,6)
ax.set_xlim(-1,4)
#These don't work
ax.tick_params(labelcolor='k', zorder=100)
ax.tick_params(direction='out', length=4, color='k', zorder=100)
#This will work but I don't want to have to do this for the plot edges every time
ax.axvline(0,c='k',zorder=100)
plt.show()
Your circle sizes are big enough that they go beyond the axis scope. So we simply change the ylim and xlim
Changed
ax.set_ylim(0,5)
ax.set_xlim(0,3)
to
ax.set_ylim(-1,6)
ax.set_xlim(-1,4)
Also, zorder doesn't play a role in pushing the points to edges.
Below I created a simple example of my dataset. I have 4 points and for each steps their value change. The points are plotted in x,y plane and I want their size to change with their value. There is also one other problem, each point is connected by a line and I don't want it. (I cannot use plt.scatter)
import pandas as pd
import matplotlib.pyplot as plt
data=[[1,1,3],[1,2,1],[2,1,9],[2,2,0]]
a=pd.DataFrame(data)
a.columns=['x','y','value']
data2=[[1,1,5],[1,2,2],[2,1,1],[2,2,3]]
b=pd.DataFrame(data2)
b.columns=['x','y','value']
data3=[[1,1,15],[1,2,7],[2,1,4],[2,2,8]]
c=pd.DataFrame(data3)
c.columns=['x','y','value']
final=[a,b,c]
for i in range(0,len(final)):
fig, ax = plt.subplots()
plt.plot(final[i]['x'],final[i]['y'],marker='o',markersize=22)
with this I fix the dimension the line appears in, how can I remove it?
If I change the markersize, it doesn't work:
for i in range(0,len(final)):
fig, ax = plt.subplots()
plt.plot(final[i]['x'],final[i]['y'],marker='o',markersize=final[i]['value'])
As I said before, the result I want is a plot in which there are only the points with different dimensions depending on their value.
Since you cannot use scatter, you need to loop over the values to use the markersize as it does not accept arrays but a scalar. Moreover, to just plot a marker, you use 'o' for a circle. I used size*5 to enlarge the circles further.
for i in range(0,len(final)):
fig, ax = plt.subplots()
for x, y, size in zip(final[i]['x'],final[i]['y'], final[i]['value']):
plt.plot(x, y, 'o', markersize=size*5)
In case you want to plot them as subplots
fig, axes = plt.subplots(1,3, figsize=(9, 2))
for i in range(0,len(final)):
for x, y, size in zip(final[i]['x'],final[i]['y'], final[i]['value']):
axes[i].plot(x, y, 'o', markersize=size*5)
plt.tight_layout()
You have an argument for the line width in plt.plot graphs. Please set it to zero.
plt.plot(final[i]["x"], final[i]["y"], marker="o", markersize=22, linewidth=0)
I am playing around with volumetric data and I am trying to project a "cosmic web" like image.
I pretty much create a file path and open the data with a module that opens hdf5 files. The x and y values are denoted by indexing from a the file gas_pos and the histogram is weighted by different properties, gas_density in this case:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
from matplotlib.ticker import LogFormatter
cmap = LinearSegmentedColormap.from_list('mycmap', ['black', 'steelblue', 'mediumturquoise', 'darkslateblue'])
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(gas_pos[:,0]/0.7, gas_pos[:,1]/0.7, bins=500, cmap=cmap, norm=matplotlib.colors.LogNorm(), weights=gas_density);
cb = fig.colorbar(H[3], ax=ax, shrink=0.8, pad=0.01, orientation="horizontal", label=r'$ \rho\ [M_{\odot}\ \mathrm{kpc}^{-3}]$')
ax.tick_params(axis=u'both', which=u'both',length=0)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
giving me this:
which is nice, but I want to up the quality and remove the grainyness of it. When I try imshow interpolation:
cmap = LinearSegmentedColormap.from_list('mycmap', ['black', 'steelblue', 'mediumturquoise', 'darkslateblue'])
fig = plt.figure()
ax = fig.add_subplot(111)
H = ax.hist2d(gas_pos[:,0]/0.7, gas_pos[:,1]/0.7, bins=500, cmap=cmap, norm=matplotlib.colors.LogNorm(), weights=gas_density);
ax.tick_params(axis=u'both', which=u'both',length=0)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
im = ax.imshow(H[0], cmap=cmap, interpolation='sinc', norm=matplotlib.colors.LogNorm())
cb = fig.colorbar(H[3], ax=ax, shrink=0.8, pad=0.01, orientation="horizontal", label=r'$ \rho\ [M_{\odot}\ \mathrm{kpc}^{-3}]$')
plt.show()
Am I using this incorrectly? or is there something better I can use to modify the pixelation?
If anyone is wanting to play with my data, I will upload the data later on today!
Using interpolation='sinc' is indeed a good method to smoothen a plot. Others would e.g. be "gaussian", "bicubic" or "spline16".
The problem you observe is that the imshow plot is plotted on top of the hist2d plot and thus takes its axes limits. Those limits seem to be smaller than the number of points in the imshow plot and therefore you only see part of the total data.
The solution is either not to plot the hist2d plot at all or at least to plot it into another subplot or figure.
Pursuing the first idea, you would calculate your histogram without plotting it, using numpy.histogram2d
H, xedges, yedges = np.histogram2d(gas_pos[:,0]/0.7, gas_pos[:,1]/0.7,
bins=500, weights=gas_density)
im = ax.imshow(H.T, cmap=cmap, interpolation='sinc', norm=matplotlib.colors.LogNorm())
I would also recommend reading the numpy.histogram2d documentation, which includes an example of plotting the histogram output in matplotlib.
You'll probably want to set interpolation='None' in the call to imshow, instead of interpolation='sinc'
I'm working with matplotlib to plot a variable in latitude longitude coordinates. The problem is that this image cannot include axes or borders. I have been able to remove axis, but the white padding around my image has to be completely removed (see example images from code below here: http://imgur.com/a/W0vy9) .
I have tried several methods from Google searches, including these StackOverflow methodologies:
Remove padding from matplotlib plotting
How to remove padding/border in a matplotlib subplot (SOLVED)
Matplotlib plots: removing axis, legends and white spaces
but nothing has worked in removing the white space. If you have any advice (even if it is to ditch matplotlib and to try another plotting library instead) I would appreciate it!
Here is a basic form of the code I'm using that shows this behavior:
import numpy as np
import matplotlib
from mpl_toolkits.basemap import Basemap
from scipy import stats
lat = np.random.randint(-60.5, high=60.5, size=257087)
lon = np.random.randint(-179.95, high=180, size=257087)
maxnsz = np.random.randint(12, 60, size=257087)
percRange = np.arange(100,40,-1)
percStr=percRange.astype(str)
val_percentile=np.percentile(maxnsz, percRange, interpolation='nearest')
#Rank all values
all_percentiles=stats.rankdata(maxnsz)/len(maxnsz)
#Figure setup
fig = matplotlib.pyplot.figure(frameon=False, dpi=600)
#Basemap code can go here
x=lon
y=lat
cmap = matplotlib.cm.get_cmap('cool')
h=np.where(all_percentiles >= 0.999)
hl=np.where((all_percentiles < 0.999) & (all_percentiles > 0.90))
mh=np.where((all_percentiles > 0.75) & (all_percentiles < 0.90))
ml=np.where((all_percentiles >= 0.4) & (all_percentiles < 0.75))
l=np.where(all_percentiles < 0.4)
all_percentiles[h]=0
all_percentiles[hl]=0.25
all_percentiles[mh]=0.5
all_percentiles[ml]=0.75
all_percentiles[l]=1
rgba_low=cmap(1)
rgba_ml=cmap(0.75)
rgba_mh=cmap(0.51)
rgba_hl=cmap(0.25)
rgba_high=cmap(0)
matplotlib.pyplot.axis('off')
matplotlib.pyplot.scatter(x[ml],y[ml], c=rgba_ml, s=3, marker=',',edgecolor='none', alpha=0.4)
matplotlib.pyplot.scatter(x[mh],y[mh], c=rgba_mh, s=3, marker='o', edgecolor='none', alpha=0.5)
matplotlib.pyplot.scatter(x[hl],y[hl], c=rgba_hl, s=4, marker='*',edgecolor='none', alpha=0.6)
matplotlib.pyplot.scatter(x[h],y[h], c=rgba_high, s=5, marker='^', edgecolor='none',alpha=0.75)
fig.savefig('/home/usr/code/python/testfig.jpg', bbox_inches=0, nbins=0, transparent="True", pad_inches=0.0)
fig.canvas.draw()
The problem is that all the solutions given at Matplotlib plots: removing axis, legends and white spaces are actually meant to work with imshow.
So, the following clearly works
import matplotlib.pyplot as plt
fig = plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.set_axis_off()
im = ax.imshow([[2,3,4,1], [2,4,4,2]], origin="lower", extent=[1,4,2,8])
ax.plot([1,2,3,4], [2,3,4,8], lw=5)
ax.set_aspect('auto')
plt.show()
and produces
But here, you are using scatter. Adding a scatter plot
import matplotlib.pyplot as plt
fig = plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.set_axis_off()
im = ax.imshow([[2,3,4,1], [2,4,4,2]], origin="lower", extent=[1,4,2,8])
ax.plot([1,2,3,4], [2,3,4,8], lw=5)
ax.scatter([2,3,4,1], [2,3,4,8], c="r", s=2500)
ax.set_aspect('auto')
plt.show()
produces
Scatter has the particularity that matplotlib tries to make all points visible by default, which means that the axes limits are set such that all scatter points are visible as a whole.
To overcome this, we need to specifically set the axes limits:
import matplotlib.pyplot as plt
fig = plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.set_axis_off()
im = ax.imshow([[2,3,4,1], [2,4,4,2]], origin="lower", extent=[1,4,2,8])
ax.plot([1,2,3,4], [2,3,4,8], lw=5)
ax.scatter([2,3,4,1], [2,3,4,8], c="r", s=2500)
ax.set_xlim([1,4])
ax.set_ylim([2,8])
ax.set_aspect('auto')
plt.show()
such that we will get the desired behaviour.