How to scatter plot 2d array in Python - python

How do you plot a scatter plot for an array result_array of shape (1087, 2) that looks like this:
array([[-1.89707840e+03, 3.99819932e+00],
[-2.55018840e+03, -2.61913223e+00],
[-1.85480840e+03, -2.36545732e-01],
...,
[-1.64432840e+03, 9.79555441e+00],
[-1.59022840e+03, 1.08955493e+01],
[-1.73963840e+03, 3.60132161e-01]])
?
Update:
Tried:
import matplotlib.pyplot as plt
plt.scatter(result_array[:, 0], result_array[:, 1])
plt.show()
and the plot looks like this:

Assuming that the array is X:
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1])
plt.show()
plt.scatter() has many addional options, see the documentation for details.
Answer to the updated question:
It seems that you have an outlier row in the array with the first coordinate close to 2.5*10^6 (which gives the point close to the right margin of the plot), while other rows have their first coordinates smaller by a few orders of magnitude. For example, the rows in the part of the array visible in the question have first coordinates close to -2000. For this reason, these rows are squished into what looks like a vertical line in the plot.
There are two possible ways to fix it:
If you really have only one (or just a few) outliers, you can remove them from the array and possibly plot them separately.
Alternatively, if you want to plot all points at once, then using the logarithmic scale on the x-axis may help. Since you have some points with negative first coordinates, you would need to use the symmetric logarithmic scale - which is logarithmic in both positive and negative directions of the x-axis.:
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1])
plt.xscale('symlog')
plt.show()

Related

Horizontal line to infinity on one side only in matplotlib

I'd like to plot a line that goes to infinity, but starting from a finite point. For simplicity, let's say that the line can be horizontal. I would like to plot a line from (0, 0) to (inf, 0).
Using hlines:
>>> fig, ax = plt.subplots()
>>> ax.hlines(0, 0, np.inf)
.../python3.8/site-packages/matplotlib/axes/_base.py:2480: UserWarning: Warning: converting a masked element to nan.
xys = np.asarray(xys)
The result is an empty plot.
axhline has a starting parameter, but it is in axis coordinates rather than data. Similar problem for axline. Is there a way to plot a (horizontal) line with one end in data coordinates and the other at infinity?
The motivation behind this is that I'd like to be able to plot some cumulative probabilities without setting data past the last bin to zero, as here: Matplotlib cumulative histogram - vertical line placement bug or misinterpretation?. Rather than simply ending the histogram, I'd like to be able to extend the line from the last bin to infinity at y=1.0.
There's no built-in function for this, but you can re-draw the line to the axis limit on each change of the x limits.
From Axes:
The events you can connect to are 'xlim_changed' and 'ylim_changed'
and the callback will be called with func(ax) where ax is the Axes
instance.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
def hline_to_inf(ax, x, y):
line = ax.hlines(0, 0, ax.get_xlim()[1])
ax.callbacks.connect('xlim_changed',
lambda ax: line.set_paths([[[x, y], [ax.get_xlim()[1], y]]]))
hline_to_inf(ax, 0, 0)
plt.show()
Part of the issue is that normal plotting methods apply the same transform to the input data. What is required here is to apply a data transform to the start point, and a blended transform to the endpoint. It seems that there may be an answer using existing tools with ConnectionPatch, as explained in the Annotations Guide. The idea is to make the left point use data coordinates and the right point have a blended transform with x in axes coordinates and y in data.
from matplotlib import pyplot as plt
from matplotlib.patches import ConnectionPatch
fig, ax = plt.subplots()
line, = ax.plot([1, 2], [1, 2])
ax.add_artist(ConnectionPatch([2, 2], [1, 2], coordsA=ax.transData, coordsB=ax.get_yaxis_transform(), color=line.get_color(), linewidth=line.get_linewidth(), clip_on=True))
Turning on clipping is necessary, otherwise you could end up with artifacts that look like this:

How to use indices of 2D array to generate heatmap with matplotlib/seaborn?

I have a 2D numpy array A that contains intensity values corresponding to each point. I am trying to generate a heatmap for these points. For example, if the value at A[3,0] is 230, then I want the square with x-value of 3 and y-value of 0 to correspond to a value of 230.
I've tried simply using the Seaborn heatmap function.
import numpy as np
import seaborn as sns
np.random.seed(0)
A = np.random.uniform(0,500,(4,3))
sns.heatmap(A, square=True)
Where A is just a 4x3 numpy array of random values. The output, however,
is a region that looks like the matrix A.
This is a 4x3 region where each point corresponds to a point in the matrix A if it were written out. But I'm not sure how to get it such that I create a heatmap using the actual indices of the matrix as the points. The heatmap in mind would actually be 3x4 and resemble part of a coordinate plane, with x-values of [0,1,2,3] and y-values of [0,1,2].
Sorry if I've explained poorly. I might be severely misunderstanding how a heatmap works, but is there a way to do this(either with or without the heatmap function)? Thanks for reading.
I think you are not confusing what a heatmap is, but what the indices of an array represent. In a 2D array, the first index is the row, and the second index is the column. There is no explicit concept of Cartesian x- and y-coordinates.
That said, you can get what you want by creating a heatmap of the array's transpose, then setting the the limits of the x-axis and y-axis so that (0, 0) is in the bottom-left corner.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(0)
A = np.random.uniform(0,500,(4,3))
sns.heatmap(A.T, square=True)
plt.xlim(0, A.shape[0])
plt.ylim(0, A.shape[1])
plt.show()

How do I use axvfill with a boolean series

I have a boolean time series that I want to use to determine the parts of the plot that should be shaded.
Currently I have:
ax1.fill_between(data.index, r_min, r_max, where=data['USREC']==True, alpha=0.2)
where, r_min and r_max are just the min and max of the y-axis.
But the fill_between doesn't fill all the way to the top and bottom of the plot because, so I wanted to use axvspan() instead.
Is there any easy way to do this given axvspan only takes coordinates? So the only way I can think of is to group all the dates that are next to each other and are True, then take the first and last of those dates and pass them into axvspan.
Thank you
You can still use fill_between, if you like. However instead of specifying the y-coordinates in data coordinates (for which it is not a priori clear, how large they need to be) you can specify them in axes coorinates. This can be achieved using a transform, where the x part is in data coordinates and the y part is in axes coordinates. The xaxis transform is such a transform. (This is not very surprising since the xaxis is always independent of the ycoorinates.) So
ax.fill_between(data.index, 0,1, where=data['USREC'], transform=ax.get_xaxis_transform())
would do the job.
Here is a complete example:
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
x = np.linspace(0,100,350)
y = np.cumsum(np.random.normal(size=len(x)))
bo = np.zeros(len(y))
bo[y>5] = 1
fig, ax = plt.subplots()
ax.fill_between(x, 0, 1, where=bo, alpha=0.4, transform=ax.get_xaxis_transform())
plt.plot(x,y)
plt.show()

How to display out of range values on image histogram?

I want to plot the RGB histograms of an image using numpy.histogram.
(See my function draw_histogram below)
It works well for a regular range of [0, 255] :
import numpy as np
import matplotlib.pyplot as plt
im = plt.imread('Bulbasaur.jpeg')
draw_histogram(im, minimum=0., maximum=255.)
What I want to do :
I expect the images I use to have out of range values. Sometimes they will be out of range, sometimes not. I want to use the RGB histogram to analyse how bad the values are out of range.
Let's say I expect the values to be at worst in the interval [-512, 512]. I still want the histogram to display the in-range intensities at the right spot, and leave blank the unpopulated range sections. For example, if I draw the histogram of Bulbasaur.jpeg again but with range [-512, 512], I expect to see the same histogram but contracted along the "x" axis (between the two dashed lines in the histogram below).
The problem :
When I try to draw the histogram for an unregular range, something goes wrong :
import numpy as np
import matplotlib.pyplot as plt
im = plt.imread('Bulbasaur.jpeg')
draw_histogram(im, minimum=-512., maximum=512.)
My code for draw_histogram() :
def draw_histogram(im, minimum, maximum):
fig = plt.figure()
color = ('r','g','b')
for i, col in enumerate(color):
hist, bins = np.histogram(im[:, :, i], int(maximum-minimum), (minimum, maximum))
plt.plot(hist, color=col)
plt.xlim([int(minimum), int(maximum)])
# Draw vertical lines to easily locate the 'regular range'
plt.axvline(x=0, color='k', linestyle='dashed')
plt.axvline(x=255, color='k', linestyle='dashed')
plt.savefig('Histogram_Bulbasaur.png')
plt.close(fig)
return 0
Question
Does anyone know a way of properly drawing RGB histogram with unregular ranges?
You should pass x values to 'plt.plot'
I changed:
plt.plot(hist, color=col)
to this:
plt.plot(np.arange(minimum,maximum),hist, color=col)
With this change, the graph began to appear normally. Essentially, plt.plot was trying to start plotting the y-values you gave it from np.hist starting at 0. This works when your expected range starts at 0, but when you want to include negative numbers, plt.plot shouldn't start at 0, rather, it should start at minimum, so using np.range to manually assign x values fixes the problem.

Matplotlib - Boxplot calculated on log10 values but shown in logarithmic scale

I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values.
I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.
So my question is basically this:
When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values?
I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.
Or is there another more direct way to plotting this. A different package maybe that has this options already included?
Many thanks for the help.
I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).
Instead, you can plot the logarithm of the data and manually adjust the y-labels.
Here is a very crude example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
np.random.seed(42)
values = 10 ** np.random.uniform(-3, 3, size=100)
fig = plt.figure(figsize=(9, 3))
ax = plt.subplot(1, 3, 1)
ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')
ax = plt.subplot(1, 3, 2)
ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')
ax = plt.subplot(1, 3, 3)
ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')
plt.show()
The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.
Alternatively, you could specify to draw the whiskers for a given percentile range:
ax.boxplot(values, whis=[5, 95])
In this case you get a fixed amount of outlires (5%) above and below.
You can use plt.yscale:
plt.boxplot(data); plt.yscale('log')

Categories

Resources