y-axis scaling in seaborn vs pandas - python

I'm plotting a scatter plot using a pandas dataframe. This works correctly, but I wanted to use seaborn themes and specials functions. When I plot the same data points calling seaborn, the y-axis remains almost invisible. X-axis values ranges from 5000-15000, while y-axis values are in [-6:6]*10^-7.
If I multiply the y-axis values by 10^6, they display correctly, but the actual values when plotted using seaborn remains invisible/indistinguishable in a seaborn generated plot.
How can I seaborn so that the y-axis values scale automatically in the resultant plot?
Also some rows even contain NaN, not in this case, how to disregard that while plotting, short of manually weeding out rows containing NaN.
Below is the code I've used to plot.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("datascale.csv")
subdf = df.loc[(df.types == "easy") & (df.weight > 1300), ]
subdf = subdf.iloc[1:61, ]
subdf.drop(subdf.index[[25]], inplace=True) #row containing NaN
subdf.plot(x='length', y='speed', style='s') #scales y-axis correctly
sns.lmplot("length", "speed", data=subdf, fit_reg=True, lowess=True) #doesn't scale y-axis properly
# multiplying by 10^6 displays the plot correctly, in matplotlib
plt.scatter(subdf['length'], 10**6*subdf['speed'])

Strange that seaborn does not scale the axis correctly. Nonetheless, you can correct this behaviour. First, get a reference to the axis object of the plot:
lm = sns.lmplot("length", "speed", data=subdf, fit_reg=True)
After that you can manually set the y-axis limits:
lm.axes[0,0].set_ylim(min(subdf.speed), max(subdf.speed))
The result should look something like this:
Example Jupyter notebook here.

Seaborn and matplotlib should just ignore NaN values when plotting. You should be able to leave them as is.
As for the y scaling: there might be a bug in seaborn.
The most basic workaround is still to scale the data before plotting.
Scale to microspeed in the dataframe before plotting and plot microspeed instead.
subdf['microspeed']=subdf['speed']*10**6
Or transform to log y before plotting, i.e.
import math
df = pd.DataFrame({'speed':[1, 100, 10**-6]})
df['logspeed'] = df['speed'].map(lambda x: math.log(x,10))
then plot logspeed instead of speed.
Another approach would be to use seaborn regplot instead.
Matplot lib correctly scales and plots for me as follows:
plt.plot(subdf['length'], subdf['speed'], 'o')

Related

SNS Heatmap, display one column label out of two [duplicate]

I have a pandas dataframe of shape (39, 67). When I plot it's seaborn heatmap, I don't get as many labels on the X and Y axes. .get_xticklabels() method also returns only 23 labels.
matplotlib doesn't show any labels (only numbers) as well.
Both these heatmaps are for the same dataframe (39, 67).
To ensure the labels are visible, you have to set the parameters xticklabels, yticklabels to True, like so.
import seaborn as sns
sns.heatmap(dataframe, xticklabels=True, yticklabels=True)
Here's the documentation for the heatmap function.
import seaborn as sns
sns.heatmap(dataframe, xticklabels=1, yticklabels=1)
You may also play with figsize=(7, 5) to adjust the scale.
The answers here didnt work for me so I followed the suggestions here.
Try opening a separate matplotlib window and tweak the parameters there,
Python sns heatmap does not fully display x labels

Seaborn pointplot x-axis distance between labels does not correspond to actual values

Seaborn pointplot does not space x-axis ticks the right way. Here is a MWE
import seaborn as sns
df = sns.load_dataset('tips').iloc[:4]
df = df.round({'total_bill':0})
sns.pointplot(data=df, x='total_bill', y='tip')
How do I make the spacing of the x-axis labels correspond to their values?
Edit:
Looks like this is not a bug, but a feature. From the docs:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
If there is some hack to override this behaviour I'm interested!!
The pointplot is meant for showing interaction between categorial or ordinal.
For your case, plot it twice i guess:
sns.lineplot(data=df, x='total_bill', y='tip',marker='o')

Display all x values of a graph

I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()
Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.

Histogram of a dataframe

I have a data frame as follow:
and I am trying to plot a histogram from it such that the letters {A,B,C,D} are in the x axis and y axis shows the numbers. I have tried the following:
df.plot(kind='hist')
for which I get the address instead of the plot, i.e:
<matplotlib.axes._subplots.AxesSubplot at 0x11217d5f8>
I was wondering how can I show the plot?
IIUC, I think you need to transpose the dataframe to get index ['A','B','C','D']as x-axis and then plot. Also use plt.show() to display the histogram. The latest version of pandas will display directly the plot with axes object displaying. But, for the older versions need to explicitly write the plt.show() code to display.
import matplotlib.pyplot as plt
df.T.plot(kind='hist')
plt.show()

Create a checkerboard plot with unbalanced rows and colums

I have a dataset similar to this format X = [[1,4,5], [34,70,1,5], [43,89,4,11], [22,76,4]] where the length of element lists are not equal.
I want to create a checkerboard plot of 4 rows and 4 columns and the colorbar of each unit box corresponds to the value of the number. In this dataset some small boxes will be missing (eg. 4th column firs row).
How would I plot this in python using matplotlib?
Thanks
You can use seaborn library or matplotlib to generate heatmap. Firstly, convert it to pandas dataframe to handle missing values.
import pandas as pd
df = pd.DataFrame([[1,4,5],[34,70,1,5], [43,89,4,11],[22,76,4]])
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
sns.heatmap(df)
plt.show()
Result looks something like this.

Categories

Resources