matplotlib ignore missing data

matplotlib ignore missing data - python

The question has already been asked and has a good solution using masks.
Asking again because I'd like to know if is there a way to make matplotlib handle missing data on its own, something like if any of x or y data is missing just ignore it and draw a line through it.
Here's some sample code:
import numpy as np
import matplotlib.pyplot as plt
plt.figure()
x = np.arange(0, 100, 10)
y = np.random.randint(0, 10, 10)
plt.plot(x,y, "*-")
x_nan = np.arange(100)
y_nan = np.asarray([np.nan] * 100)
y_nan[::10] = np.random.randint(0, 10, 10)
plt.plot(x_nan,y_nan,"*-")
mask = np.isfinite(y_nan)
plt.plot(x_nan[mask],y_nan[mask],"--")
plt.show()
The second plot draws dots only for the non-nan points, but no line through them.
The easiest way to make it look like the first is to define a mask like in the third plot. I'd like to know if is there a way to make matplotlib behave like this automatically without the extra mask.

Short answer: No!
Long answer: One could indeed imagine that some feature would be built into matplotlib's plot function that would allow to remove nans from the input.
However, there is none.
But since the solution is essentially only one extra line of code, the fact that matplotlib does not provide this functionality is bearable.
Just as a fun fact: Interestingly, a scatter plot indeed irgnores nan values, e.g.
line, = plt.plot(x_nan,y_nan,"-")
scatter = plt.scatter(x_nan,y_nan)
print(len(line.get_xdata())) # 100
print(len(scatter.get_offsets())) # 10
while the line has still 100 points, the scatter only has 10, as all nan values are removed.

Related

How to style/format point markers in Plotly 3D scatterplot?

I am unsure how to customize scatterplot marker styles in Plotly scatterplots.
Specifically, I have a column predictions that is 0 or 1 (1 represents an unexpected value) and even though I used the symbol parameter in px.scatter_3d to indicate the unexpected value through varying point shape (diamond for 1 and circle for 0), the difference is very subtle and I want it to be more dramatic. I was envisioning something like below (doesn't need to be exactly this), but something along the lines of the diamond shaped points have a different outline colors or an additional shape/bubble around it. How would I do this?
Additionally, I have a set column which can take on one of two values, set A or set B. I used the color parameter inside px.scatter_3d and made that equal to set so the points are colored according to which set it came from. While it is doing what I asked, I don't want the colors to be blue and red, but any two colors I specify. How would I be able to this (let's say I want the colors to be blue and orange instead)? Thank you so much!
Here is the code I used:
fig = px.scatter_3d(X_combined, x='x', y='y', z='z',
color='set', symbol='predictions', opacity=0.7)
fig.update_traces(marker=dict(size=12,
line=dict(width=5,
color='Black')),
selector=dict(mode='markers'))

You can use multiple go.Scatter3d() statements and gather them in a list to format each and every segment or extreme values more or less exactly as you'd like. This can be a bit more demanding than using px.scatter_3d(), but it will give you more control. The following plot is produced by the snippet below:
Plot:
Code:
import plotly.graph_objects as go
import numpy as np
import pandas as pd
# sample data
t = np.linspace(0, 10, 50)
x, y, z = np.cos(t), np.sin(t), t
# plotly data
data=[go.Scatter3d(x=[x[2]], y=[y[2]], z=[z[2]],mode='markers', marker=dict(size=20), opacity=0.8),
go.Scatter3d(x=[x[26]], y=[y[26]], z=[z[26]],mode='markers', marker=dict(size=30), opacity=0.3),
go.Scatter3d(x=x, y=y, z=z,mode='markers')]
fig = go.Figure(data)
fig.show()
How you identify the different segmens, whether it be max or min values will be entirely up to you. Anyway, I hope this approach will be useful!

Understanding maplotlib and how to format matplotlib axis with a single digit?

I am having an very hard time getting the ticklabels of a seaborn heatmap to show only single integers (i.e. no floating numbers). I have two lists that form the axes of a data frame that i plot using seaborn.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
x = np.linspace(0, 15, 151)
y = np.linspace(0, 15, 151)
#substitute random data for my_data
df_map = pd.DataFrame(my_data, index = y, columns = x)
plt.figure()
ax = sns.heatmap(df_map, square = True, xticklabels = 20, yticklabels = 20)
ax.invert_yaxis()
I've reviewed many answers and the documents. My biggest problem is I have little experience and a very poor understanding of matplotlib and the docs feel like a separate language... Here are the things I've tried.
ATTEMPT 1: A slightly modified version of the solution to this question:
fmtr = tkr.StrMethodFormatter('{x:.0f}')
plt.gca().xaxis.set_major_formatter(fmtr)
I'm pretty sure tkr.StrMethodFormatter() is displaying every 20th index of the value it encounters in my axis string, which is probably due to my settings in sns.heatmap(). I tried different string inputs to tkr.StrMethodFormatter() without success. I looked at two other questions and tried different combinations of tkr classes that were used in answers for here and here.
ATTEMPT 2:
fmtr = tkr.StrMethodFormatter("{x:.0f}")
locator = tkr.MultipleLocator(50)
fstrform = tkr.FormatStrFormatter('%.0f')
plt.gca().xaxis.set_major_formatter(fmtr)
plt.gca().xaxis.set_major_locator(locator)
#plt.gca().xaxis.set_major_formatter(fstrform)
And now i'm at a complete loss. I've found out locator changes which nth indices to plot, and both fmtr and fstrform change the number of decimals being displayed, but i cannot for the life of me get the axes to display the integer values that exist in the axes lists!
Please help! I've been struggling for hours. It's probably something simple, and thank you!
As an aside:
Could someone please elaborate on the documentation excerpt in that question, specifically:
...and the field used for the position must be labeled pos.
Also, could someone please explain the differences between tkr.StrMethodFormatter("{x:.0f}") and tkr.FormatStrFormatter('%.0f')? I find it annoying there are two ways, each with their own syntax, to produce the same result.
UPDATE:
It took me a while to get around to implementing the solution provided by #ImportanceOfBeingErnest. I took an extra precaution and rounded the numbers in the x,y arrays. I'm not sure if this is necessary, but I've produced the result I wanted:
x = np.linspace(0, 15, 151)
y = np.linspace(0, 15, 151)
# round float numbers in axes arrays
x_rounded = [round(i,3) for i in x]
y_rounded = [round(i,3) for i in y]
#substitute random data for my_data
df_map = pd.DataFrame(my_data, index = y_rounded , columns = x_rounded)
plt.figure()
ax0 = sns.heatmap(df_map, square = True, xticklabels = 20)
ax0.invert_yaxis()
labels = [label.get_text() for label in ax0.get_xticklabels()]
ax0.set_xticklabels(map(lambda x: "{:g}".format(float(x)), labels))
Although I'm still not entirely sure why this worked; check the comments between me and them for clarification.

The sad thing is, you didn't do anything wrong. The problem is just that seaborn has a very perculiar way of setting up its heatmap.
The ticks on the heatmap are at fixed positions and they have fixed labels. So to change them, those fixed labels need to be changed. An option to do so is to collect the labels, convert them back to numbers, and then set them back.
labels = [label.get_text() for label in ax.get_xticklabels()]
ax.set_xticklabels(map(lambda x: "{:g}".format(float(x)), labels))
labels = [label.get_text() for label in ax.get_yticklabels()]
ax.set_yticklabels(map(lambda x: "{:g}".format(float(x)), labels))
A word of caution: One should in principle never set the ticklabels without setting the locations as well, but here seaborn is responsible for setting the positions. We just trust it do do so correctly.
If you want numeric axes with numeric labels that can be formatted as attempted in the question, one may directly use a matplotlib plot.
import numpy as np
import seaborn as sns # seaborn only imported to get its rocket cmap
import matplotlib.pyplot as plt
my_data = np.random.rand(150,150)
x = (np.linspace(0, my_data.shape[0], my_data.shape[0]+1)-0.5)/10
y = (np.linspace(0, my_data.shape[1], my_data.shape[1]+1)-0.5)/10
fig, ax = plt.subplots()
pc = ax.pcolormesh(x, y, my_data, cmap="rocket")
fig.colorbar(pc)
ax.set_aspect("equal")
plt.show()
While this already works out of the box, you may still use locators and formatters as attempted in the question.

How to decrease the density of x-ticks in seaborn

I have some data, based on which I am trying to build a countplot in seaborn. So I do something like this:
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
and get my countplot:
The problem is that ticks on the x-axis are too dense (which makes them useless). I tried to decrease the density with plot_.xticks=np.arange(0, 40, 10) but it didn't help.
Also is there a way to make the plot in one color?

Tick frequency
There seem to be multiple issues here:
You are using the = operator while using plt.xticks. You should use a function call instead (but not here; read point 2 first)!
seaborn's countplot returns an axes-object, not a figure
you need to use the axes-level approach of changing x-ticks (which is not plt.xticks())
Try this:
for ind, label in enumerate(plot_.get_xticklabels()):
if ind % 10 == 0: # every 10th label is kept
label.set_visible(True)
else:
label.set_visible(False)
Colors
I think the data-setup is not optimal here for this type of plot. Seaborn will interpret each unique value as new category and introduce a new color. If i'm right, the number of colors / and x-ticks equals the number of np.unique(data).
Compare your data to seaborn's examples (which are all based on data which can be imported to check).
I also think working with seaborn is much easier using pandas dataframes (and not numpy arrays; i often prepare my data in a wrong way and subset-selection needs preprocessing; dataframes offer more). I think most of seaborn's examples use this data-input.

even though this has been answered a while ago, adding another perhaps simpler alternative that is more flexible.
you can use an matplotlib axis tick locator to control which ticks will be shown.
in this example you can use LinearLocator to achieve the same thing:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.ticker as ticker
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
plot_.xaxis.set_major_locator(ticker.LinearLocator(10))

Since you have tagged matplotlib, one solution different from setting the ticks visible True/False is to plot every nth label as following
fig = plt.figure(); np.random.seed(123)
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
fig.canvas.draw()
new_ticks = [i.get_text() for i in plot_.get_xticklabels()]
plt.xticks(range(0, len(new_ticks), 10), new_ticks[::10])

As a slight modification of the accepted answer, we typically select labels based on their value (and not index), e.g. to display only values which are divisible by 10, this would work:
for label in plot_.get_xticklabels():
if np.int(label.get_text()) % 10 == 0:
label.set_visible(True)
else:
label.set_visible(False)

Subplots in two separate figure windows inside one loop using matplotlib

I want to plot two separate quantities while running through a loop. I want to create a separate figure window for each quantity, such that each iteration of the loop creates one subplot for each quantity.
Basically, I want my code to do something like this:
import numpy as np
import matplotlib.pyplot as plt
nr = [10, 15, 20, 25, 35, 50]
fig1 = plt.figure(1)
fig2 = plt.figure(2)
for y in range(len(nr)):
m = np.arange(y+1)
n = (y+1)*np.arange(y+1)
fig1.subplot(3,2,y+1)
fig1.plot(m,n, 'b')
fig1.title('y=%s'%y)
m1 = np.square(np.arange(y+1))
n1 = (y+1)*np.arange(y+1)
fig2.subplot(3,2,y+1)
fig2.plot(m1,n1, 'r')
fig2.title('y=%s'%y)
fig1.show()
fig2.show()
This code doesn't work; gives me the error message that 'Figure' object has no attribute 'subplot'. I've tried many variations on this link - http://matplotlib.org/api/pyplot_api.html, but I am unable to understand how to do it the right way.
In the output, I want two figure windows, each with 6 subplots, such that the first one contains plots of m vs n, and the second one contains plots of m1 vs n1.

Okay, long explanation because there are multiple issues here.
The biggest problem you are running into is that there are multiple ways to handle things in matplotlib. In fact, there are effectively multiple interfaces. The easiest and most commonly used method is to just create your plot using pyplot and its methods like pyplot.subplot and pyplot.plot. This can work well for quick plots, but will not work well for your situation.
Since you want to create two figures and alternate plotting to those figures, you are going to want to use the more powerful objects in pyplot. You have gotten part way there yourself, but I'll try to help you with the last part.
You are good up until here:
import numpy as np
import matplotlib.pyplot as plt
nr = [10, 15, 20, 25, 35, 50]
fig1 = plt.figure(1)
fig2 = plt.figure(2)
for y in range(len(nr)):
m = np.arange(y+1)
n = (y+1)*np.arange(y+1)
but when you try to use the methods of Figure, you are getting confused and trying to use similar methods that belong to pyplot. The next portion should be rewritten as:
ax1 = fig1.add_subplot(3,2,y)
ax1.plot(m,n, 'b')
ax1.set_title('y=%s'%y)
m1 = np.square(np.arange(y+1))
n1 = (y+1)*np.arange(y+1)
ax2 = fig2.add_subplot(3,2,y)
ax2.plot(m1,n1, 'r')
ax2.set_title('y=%s'%y)
Here, what you have done is capture the Axes instance that is returned from add_subplot(). Then you plot onto the Axes instance. Also, when specifying where you want the subplot to be placed (the third input to Figure.add_subplot()), you do not want to use y+1 because that would start at 1 and end at 6 which would go out of the available range of 0-5. Remember that Python indices start with zero.
Finally, to show the figures you just created, you can either call pyplot.show() like this:
plt.show()
or you can save the figures to files like this:
fig1.savefig('fig1.png')
fig2.savefig('fig2.png')
The resulting figures look like this:

matplotlib NaN's vs pylab NaN's

I have two similar pieces of matplotlib codes that produce different results.
1:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,100)
y = np.linspace(0,10,100)
y[10:40] = np.nan
plt.plot(x,y)
plt.savefig('fig')
2:
from pylab import *
x = linspace(0,10,100)
y = linspace(0,10,100)
y[10:40] = np.nan
plot(x,y)
savefig('fig')
Code #1 produces a straight line with the NaN region filled in with a solid line of a different color
Code #2 produces a figure with a straight line but does not fill in the NaN region with a line. Instead there is a gap there.
How can I make code # 1 produce a gap in place of NaN's like code #2. I have been googling for a couple of days and have come up with nothing. Any help or advice would be appreciated. Thanks in advance

Just to explain what's probably happening:
The two pieces of code you showed are identical. They will always produce the same output if called by themselves. pylab is basically a just a few lines of code that does: (There's a bit more to it than this, but it's the basic idea.)
from numpy import *
from matplotlib.mlab import *
from matplotlib.pyplot import *
There's absolutely no way for pylab.plot to reference a different function than plt.plot
However, if you just call plt.plot (or pylab.plot, they're the same function), it plots on the current figure.
If you plotted something on that figure before, it will still be there. (If you're familiar with matlab, matplotlib defaults to hold('on'). You can change this with plt.hold, but it's best to be more explicit in python and just create a new figure.)
Basically, you probably did this:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,10,100)
y = np.linspace(0,10,100)
plt.plot(x,y)
plt.savefig('fig')
And then, in the same interactive ipython session, you did this:
y[10:40] = np.nan
plt.plot(x, y)
plt.savefig('fig')
Because you didn't call show, the current figure is still the same one as it was before. The "full" line is still present beneath the second one, and the second line with the NaN's is a different color because you've plotted on the same axes.
This is one of the many reasons why it's a good idea to use the object-oriented interface. That way you're aware of exactly which axes and figure you're plotting on.
For example:
fig, ax = plt.subplots()
ax.plot(x, y)
fig.savefig('test.png')
If you're not going to do that, at very least always explicitly create a new figure and/or axes when you want a new figure. (e.g. start by calling plt.figure())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

matplotlib ignore missing data - python

Related

How to style/format point markers in Plotly 3D scatterplot?

Understanding maplotlib and how to format matplotlib axis with a single digit?

How to decrease the density of x-ticks in seaborn

Subplots in two separate figure windows inside one loop using matplotlib

matplotlib NaN's vs pylab NaN's

Categories

Resources