Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:
Related
I'm trying to create plots which show the correlation of the "value" parameter to different categorical parameters. Here's what I have so far:
plot = sns.pairplot(df, x_vars=['country', 'tier_code', 'industry', 'company_size', 'region'], y_vars=['value'], height=10)
Which produces the following set of plots:
As you can see, the x axis is extremely crowded for the "country" and "industry" plots. I would like to rotate the category labels 90 degrees so that they wouldn't overlap.
All the examples for rotating I could find were for other kinds of plots and didn't work for the pairplot. I could probably get it to work if I made each plot separately using catplot, but I would like to make them all at once. Is that possible?
I am using Google Colab in case it makes any difference. My seaborn version number is 0.10.0.
Manish's answer uses the get_xticklabels method, which doesn't always play well with the higher level seaborn functions in my experience. So here's a solution avoiding that. Since I don't have your data, I'm using seaborn's tips dataset for an example.
I'm naming the object returned by sns.pairplot() grid, just to remind us that it is a PairGrid instance. In general, its axes attribute yields a two-dimensional array of axes objects, corresponding to the subplot grid. So I'm using the flat method to turn this into a one-dimensional array, although it wouldn't be necessary in your special case with only one row of subplots.
In my example I don't want to rotate the labels for the third subplot, as they are single digits, so I slice the axes array accordingly with [:2].
import seaborn as sns
sns.set()
tips = sns.load_dataset("tips")
grid = sns.pairplot(tips, x_vars=['sex', 'day', 'size'], y_vars=['tip'])
for ax in grid.axes.flat[:2]:
ax.tick_params(axis='x', labelrotation=90)
You can rotate x-axis labels as:
plot = sns.pairplot(df, x_vars=['country', 'tier_code', 'industry', 'company_size', 'region'],
y_vars=['value'], height=10)
rotation = 90
for axis in plot.fig.axes: # get all the axis
axis.set_xticklabels(axis.get_xticklabels(), rotation = rotation)
plot.fig.show()
Hope it helps.
I was wondering if it was possible to apply hue to only the lower part of a seaborn PairGrid.
For example, say I have the following figure:
For what I need to present I'd like to keep the density plots on the diagonal, the overall scatter plots on the upper (with printed correlation coefficients above them which I know how to do), but on the lower I want to split the points up by hue just to show my audience what happens if we do subset the data.
I thought about just finding out the correlations for the upper, doing a hue plot and just changing all of the markers in the upper plots to the same colour, but then I lose the densities on my diagonal.
Does anybody know if my problem is possible?
Current code I'm using is
ff = sns.PairGrid(test2,vars=['OzekePower','Power0','Power1','Power2'],palette="husl")
ff.map_upper(sns.scatterplot)
ff.map_lower(sns.scatterplot)
ff.map_diag(sns.kdeplot)
So what I'm hoping for is something like
ff.map_lower(sns.scatterplot,hue='species')
but this yields an error.
EDIT - I can do it if I leave the diag and upper blank and assign to the empty plots individually but this seems a lot more lengthy.
Unfortunately, PairGrid does not have a map_dataframe method, which could otherwise be used to include further dataframe columns into the mapping. Kind of a hack to obtain hue only in the lower part of the PairGrid is to leave the hue argument out in the PairGrid creation, and fill the grid where no hue is desired.
Then manually set the required parameters for the hue manually to the grid and finally call map_lower, which will then see the grid as if it had hue specified from the start.
import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset("iris", cache=True)
g = sns.PairGrid(df)
g.map_upper(sns.scatterplot)
g.map_diag(sns.kdeplot)
# Now set parameters needed for `hue`
g.hue_vals = df["species"]
g.hue_names = df["species"].unique()
g.palette = sns.color_palette("husl", len(g.hue_names))
# Then map lower
g.map_lower(sns.scatterplot)
plt.show()
I am trying to plot a data and function with matplotlib 2.0 under python 2.7.
The x values of the function are evolving with time and the x is first decreasing to a certain value, than increasing again.
If the function is plotted against time, it shows function like this plot of data against time
I need the same x axis evolution for plotting against real x values. Unfortunately as the x values are the same for both parts before and after, both values are mixed together. This gives me the wrong data plot:
In this example it means I need the x-axis to start on value 2.4 and decrease to 1.0 than again increase to 2.4. I swear I found before that this is possible, but unfortunately I can't find a trace about that again.
A matplotlib axis is by default linearly increasing. More importantly, there must be an injective mapping of the number line to the axis units. So changing the data range is not really an option (at least when the aim is to keep things simple).
It would hence be good to keep the original numbers and only change the ticks and ticklabels on the axis. E.g. you could use a FuncFormatter to map the original numbers to
np.abs(x-tp)+tp
where tp would be the turning point.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
x = np.linspace(-10,20,151)
y = np.exp(-(x-5)**2/19.)
plt.plot(x,y)
tp = 5
fmt = lambda x,pos:"{:g}".format(np.abs(x-tp)+tp)
plt.gca().xaxis.set_major_formatter(matplotlib.ticker.FuncFormatter(fmt))
plt.show()
One option would be to use two axes, and plot your two timespans separately on each axes.
for instance, if you have the following data:
myX = np.linspace(1,2.4,100)
myY1 = -1*myX
myY2 = -0.5*myX-0.5
plt.plot(myX,myY, c='b')
plt.plot(myX,myY2, c='g')
you can instead create two subplots with a shared y-axis and no space between the two axes, plot each time span independently, and finally, adjust the limits of one of your x-axis to reverse the order of the points
fig, (ax1,ax2) = plt.subplots(1,2, gridspec_kw={'wspace':0}, sharey=True)
ax1.plot(myX,myY1, c='b')
ax2.plot(myX,myY2, c='g')
ax1.set_xlim((2.4,1))
ax2.set_xlim((1,2.4))
My goal is to have a single column heat map, but for some reason to code I normally use for heat maps doesn't work with if I'm not using a 2-D array.
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
plt.imshow(vec1, cmap='jet')
I know it's weird to show I single column vector as a heat map, but it's a nice visual for my purposes. I just want a column of colored squares that I can label along the y-axis to show a ranked list of things to people.
You could use the library Seaborn to do this. In Seaborn you can identify specific columns to plot. In this case that'd be your array. The following should accomplish what you're wanting
vec1 = np.asarray([1,2,3,4,5])
fig, ax = plt.subplots()
seaborn.heatmap([vec1])
Then you'll just have to do your formatting on that heatmap as you would in pyplotlib.
http://seaborn.pydata.org/generated/seaborn.heatmap.html
Starting from the previous answer, I've come up with an approach which uses both Seaborn and Matplotlib's transform to do what pavlov requested within its comment (that is, swapping axis in a heatmap even though Seaborn does not have an orientation parameter).
Let's start from the previous answer:
vec1 = np.asarray([1,2,3,4,5])
sns = heatmap([vec1])
plt.show()
Using heatmap on a single vector yields to the following result:
Ok, let's swap the x-axis with the y-axis. To do that, we can use an Affine2D transform, applying a rotation of 90 degrees.
from matplotlib import transforms
tr = transforms.Affine2D().rotate_deg(90)
Let's also reshape the initial array to make it resemble a column vector:
vec2 = vec1.reshape(vec1.shape[0], 1)
Now we can plot the heatmap and force Matplotlib to perform an affine transform:
sns.heatmap(vec2)
plt.show(tr)
The resulting plot is:
Now, if we want to force each row to be a square, we can simply use the square=True parameter:
sns.heatmap(vec2, square=True)
plt.show(tr)
This is the final result:
Hope it helps!
I have a pair of lists of numbers representing points in a 2-D space, and I want to represent the y/x ratios for these points as a 1-dimensional heatmap, with a diverging color map centered around 1, or the logs of my ratios, with a diverging color map centered around 0.
How do I do that?
My current attempt (borrowing somewhat from Heatmap in matplotlib with pcolor?):
from matplotlib import numpy as np
import matplotlib.pyplot as plt
# There must be a better way to generate arrays of random values
x_values = [np.random.random() for _ in range(10)]
y_values = [np.random.random() for _ in range(10)]
labels = list("abcdefghij")
ratios = np.asarray(y_values) / np.asarray(x_values)
axis = plt.gca()
# I transpose the array to get the points arranged vertically
heatmap = axis.pcolor(np.log2([ratios]).T, cmap=plt.cm.PuOr)
# Put labels left of the colour cells
axis.set_yticks(np.arange(len(labels)) + 0.5, minor=False)
# (Not sure I get the label order correct...)
axis.set_yticklabels(labels)
# I don't want ticks on the x-axis: this has no meaning here
axis.set_xticks([])
plt.show()
Some points I'm not satisfied with:
The coloured cells I obtain are horizontally-elongated rectangles. I would like to control the width of these cells and obtain a column of cells.
I would like to add a legend for the color map. heatmap.colorbar = plt.colorbar() fails with RuntimeError: No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf).
One important point:
matplotlib/pyplot always leaves me confused: there seems to be a lot of ways to do things and I get lost in the documentation. I never know what would be the "clean" way to do what I want: I welcome suggestions of reading material that would help me clarify my very approximative understanding of these things.
Just 2 more lines:
axis.set_aspect('equal') # X scale matches Y scale
plt.colorbar(mappable=heatmap) # Tells plt where it should find the color info.
Can't answer your final question very well. Part of it is due to we have two branches of doing things in matplotlib: the axis way (axis.do_something...) and the MATLAB clone way plt.some_plot_method. Unfortunately we can't change that, and it is a good feature for people to migrate into matplotlib. As far as the "Clean way" is concerned, I prefer to use whatever produces the shorter code. I guess that is inline with Python motto: Simple is better than complex and Readability counts.