pandas.DataFrame.plot showing colormap inconsistently - python

So am trying to make some plots and was trying to use the cmap "jet". It kept appearing as viridis, so I dug around SE and tried some very simple plots:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 100)
y = x
t = x
df = pd.DataFrame([x,y]).T
df.plot(kind="scatter", x=0, y=1, c=t, cmap="jet")
x = np.arange(0, 100.1)
y = x
t = x
df = pd.DataFrame([x,y]).T
df.plot(kind="scatter", x=0, y=1, c=t, cmap="jet")
Any thoughts on what is going on here? I can tell that it has something to do with the dtype of the fields in the dataframe (added dypte="float" to the first set of code and got the same result as in the second set of code), but don't see why this would be the case.
Naturally, what I really would like is a workaround if there isn't something wrong with my code.

It actually seems to be related to pandas (scatter) plot and as you've pointed out to dtype float - some more details at the end.
A workaround is to use matplotlib.
The plot is looking the same in the end, but the cmap="jet" setting is also applied for float dtype:
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.arange(0, 100.1)
y = x
t = x
df = pd.DataFrame([x,y]).T
fig, ax = plt.subplots(1,1)
sc_plot = ax.scatter(df[0], df[1], c=t, cmap="jet")
fig.colorbar(sc_plot)
ax.set_ylabel('1')
ax.set_xlabel('0')
plt.show()
Or the shorter version (a little bit closer to the brief df.plot call) using pyplot instead of the Object Oriented Interface:
df = pd.DataFrame([x,y]).T
sc_plot = plt.scatter(df[0], df[1], c=t, cmap="jet")
plt.colorbar(sc_plot)
plt.ylabel('1')
plt.xlabel('0')
plt.show()
Concerning the root cause why pandas df.plot isn't following the cmap setting:
The closest I could find is that pandas scatter plot c takes
str, int or array-like
(while I'm not sure why t isn't referring to the index which would be int again).
Even df.plot(kind="scatter", x=0, y=1, c=df.index.values.tolist(), cmap='jet') falls back to viridis, while df.index.values.tolist() clearly is just int.
Which is even more strange, as pandas df.plot also uses matplotlib by default:
Uses the backend specified by the option plotting.backend. By default,
matplotlib is used.

Looks like it's a new bug in pandas 1.5.0. Reverting pandas to 1.4.4 fixes it. So if you don't need 1.5.0 per se, I'd suggest to reinstall 1.4.4 until the bugfix.

Related

matplotlib swap x and y axis

Hello I have made a plot in matplot lib using pandas however I need to swap my x and y axis.
Here is my plot:
broomstick plot
however i need it to look like this:
correct broomstick plot
I'm using a pandas dataframe to plot the data.
I've looked over some documentation and other posts regarding swapping the x and y axis and haven't found any easy way to do this.
Here some of my python code:
python code
Any resources or ideas would be greatly appreciated.
Try this. You need to include your y_vals as an additional column. So with this you can just specify your axis here df.plot(x=, y=):
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 100)
y = np.cos(x)
df = pd.DataFrame({'y': y, 'x': x})
df.plot(x='x')
plt.show()
df.plot(x='y')
plt.show()
Plots:

Howto force Pandas and native matplotlib to share axis

I folks,
Consider the following example
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1)
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
x.plot(ax=ax1)
y = np.random.random([len(dates),100]) * x.values
ax2.pcolormesh(range(len(x)), np.linspace(-1,1,100), y.T)
plt.show()
At this point, I would like the both axis (ax1,ax2) to share the x-axis, i.e. displaying proper pandas dates on the second axis. sharex=True does not seem to work. How can I achieve that? I tried different possibilities which did not work out.
Edit: Since the pandas date formatting is superior to the native matplotlib formatting, please provide me with a solution where pandas date formatting is used (for instance, zooming with an interactive environment works much better with pandas date formatting). Thanks You!
One way to do it would be to do all the plotting with matplotlib, this way there are no problems with the different time formats being used:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1, sharex='col')
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
#x.plot(ax=ax1)
ax1.plot(x.index, x.values)
y = np.random.random([len(dates),100]) * x.values
ax2.pcolormesh(x.index, np.linspace(-1,1,100), y.T)
fig.tight_layout()
plt.show()
This gives the following plot:
What seems to work fine is to first plot the same line into the axes that should host the image, then plot the image, then remove the line again. What this does is that it tells pandas to apply its locators and formatters to that axes; they will stay after removing the line.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fig, (ax1,ax2) = plt.subplots(2,1, sharex=True)
dates = pd.date_range("2018-01-01","2019-01-01",freq = "1d")
x = pd.DataFrame(index = dates, data = np.linspace(0,1,len(dates)) )
x.plot(ax=ax1)
y = np.random.random([len(dates),100]) * x.values
x.plot(ax=ax2, legend=False)
ax2.pcolormesh(dates, np.linspace(-1,1,100), y.T)
ax2.lines[0].remove()
plt.show()
Note that there may be caveats of this solution when zooming or panning. Consider it more like a hack and use it as long as it works, but don't blame anyone once it doesn't.

Set Seaborn PairGrid x-axis with 2 different value ranges

[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.

Matplotlib normalize colorbar (Python)

I'm trying to plot a contourf-plot using matplotlib (and numpy of course). And it works, it plots what it should plot, but unfortunatelly I cannot set the colorbar range. The problem is that I have a plenty of plots and need all of them to have the same colorbar (same min and max, same colors). I copy&past-ed almost every code snippet I found on the internet, but without success. My code so far:
import numpy as np;
import matplotlib as mpl;
import matplotlib.pyplot as plt;
[...]
plotFreq, plotCoord = np.meshgrid(plotFreqVect, plotCoordVect);
figHandler = plt.figure();
cont_PSD = plt.contourf(plotFreq, plotCoord, plotPxx, 200, linestyle=None);
normi = mpl.colors.Normalize(vmin=-80, vmax=20);
colbar_PSD = plt.colorbar(cont_PSD);
colbar_PSD.set_norm(normi);
#colbar_PSD.norm = normi;
#mpl.colors.Normalize(vmin=-80, vmax=20);
plt.axis([1, 1000, -400, 400]);
As you can see there are three different lines for the colorbar norm, none of them is working. The range is still set automatically...
I mean everything else is working, why not the colorbar? I don't even get errors or warnings.
Thanks,
itpdg
EDIT 1: Pictures, with plt.clim(-80,20):
Please user the levels parameter, a set of examples:
In [9]:
ndom
z = np.random.random((10,10))
Without levels, colorbar will be auto-scaled
In [11]:
plt.contourf(z)
plt.colorbar()
Out[11]:
<matplotlib.colorbar.Colorbar at 0x120d47390>
In [12]:
plt.contourf(z*2)
plt.colorbar()
Out[12]:
<matplotlib.colorbar.Colorbar at 0x120f6ac10>
Control colorbar with explicit levels
In [13]:
plt.contourf(z*2, levels=np.linspace(0,2,20))
plt.colorbar()
Out[13]:
<matplotlib.colorbar.Colorbar at 0x121b119d0>
In [14]:
plt.contourf(z, levels=np.linspace(0,2,20))
plt.colorbar()
Out[14]:
<matplotlib.colorbar.Colorbar at 0x120dc3510>
I ran into this issue a while back and thought it was a bug (see MPL issue #5055). It's not, but it does require using the extend kwarg, which was non-intuitive to me. Here's what you want to do:
normi = mpl.colors.Normalize(vmin=-80, vmax=20)
cont_PSD = plt.contourf(plotFreq, plotCoord, plotPxx,
np.linspace(-80, 20, 200),
linestyle=None,
norm=normi, extend='both')
plt.colorbar(colbar_PSD)
You can do-away with the plt.clim, colbar_PSD.set_norm and other similar calls.
More examples uses of extend= are available here.
Note that this will create a colorbar with 'triangles' at the top and bottom indicating that the data extends beyond the colorbar, but I think you'll like them once you get used to them, they are descriptive.
Good luck!
add this after plt.colorbar():
plt.clim(minimal_value, maximal_value)
for the contour plot, add the args vmin and vmax:
cont_PSD = plt.contourf(plotFreq, plotCoord, plotPxx, 200, linestyle=None,vmin=minimal_value,vmax=maximal_value)
You complete code should work like this :
import numpy as np;
import matplotlib as mpl;
import matplotlib.pyplot as plt;
[...]
plotFreq, plotCoord = np.meshgrid(plotFreqVect, plotCoordVect);
figHandler = plt.figure();
cont_PSD = plt.contourf(plotFreq, plotCoord, plotPxx, 200, linestyle=None,vmin=minimal_value,vmax=maximal_value);
plt.colorbar()
plt.clim(minimal_value,maximal_value)
plt.show()

Format y axis as percent

I have an existing plot that was created with pandas like this:
df['myvar'].plot(kind='bar')
The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)
How can I format the y axis as percentages without changing the line above?
Here is the solution I found but requires that I redefine the plot:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick
data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))
fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)
ax.plot(perc, data)
fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)
plt.show()
Link to the above solution: Pyplot: using percentage on x axis
This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):
import ...
import matplotlib.ticker as mtick
ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).
The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.
Update
PercentFormatter was introduced into Matplotlib proper in version 2.1.0.
pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100,5))
# you get ax from here
ax = df.plot()
type(ax) # matplotlib.axes._subplots.AxesSubplot
# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])
Jianxun's solution did the job for me but broke the y value indicator at the bottom left of the window.
I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):
import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.DataFrame(np.random.randn(100,5))
ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
Generally speaking I'd recommend using FuncFormatter for label formatting: it's reliable, and versatile.
For those who are looking for the quick one-liner:
plt.gca().set_yticklabels([f'{x:.0%}' for x in plt.gca().get_yticks()])
this assumes
import: from matplotlib import pyplot as plt
Python >=3.6 for f-String formatting. For older versions, replace f'{x:.0%}' with '{:.0%}'.format(x)
I'm late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.
Echoing #Mad Physicist answer, using the package PercentFormatter it would be:
import matplotlib.ticker as mtick
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer
I propose an alternative method using seaborn
Working code:
import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)
You can do this in one line without importing anything:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{}%'.format))
If you want integer percentages, you can do:
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format))
You can use either ax.yaxis or plt.gca().yaxis. FuncFormatter is still part of matplotlib.ticker, but you can also do plt.FuncFormatter as a shortcut.
Based on the answer of #erwanp, you can use the formatted string literals of Python 3,
x = '2'
percentage = f'{x}%' # 2%
inside the FuncFormatter() and combined with a lambda expression.
All wrapped:
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: f'{y}%'))
Another one line solution if the yticks are between 0 and 1:
plt.yticks(plt.yticks()[0], ['{:,.0%}'.format(x) for x in plt.yticks()[0]])
add a line of code
ax.yaxis.set_major_formatter(ticker.PercentFormatter())

Categories

Resources