I would like to move the x-axis to the top of my plot and manually fill the colors. However, the usual method in ggplot does not work in plotnine. When I provide the position='top' in my scale_x_continuous() I receive the warning: PlotnineWarning: scale_x_continuous could not recognize parameter 'position'. I understand position is not in plotnine's scale_x_continuous, but what is the replacement? Also, scale_fill_manual() results in an Invalid RGBA argument: 'color' error. Specifically, the value requires an array-like object. Thus I provided the array of colors, but still had an issue. How do I manually set the colors for a scale_fill object?
import pandas as pd
from plotnine import *
lst = [[1,1,'a'],[2,2,'a'],[3,3,'a'],[4,4,'b'],[5,5,'b']]
df = pd.DataFrame(lst, columns =['xx', 'yy','lbls'])
fill_clrs = {'a': 'goldenrod1',
'b': 'darkslategray3'}
ggplot()+\
geom_tile(aes(x='xx', y='yy', fill = 'lbls'), df) +\
geom_text(aes(x='xx', y='yy', label='lbls'),df, color='white')+\
scale_x_continuous(expand=(0,0), position = "top")+\
scale_fill_manual(values = np.array(list(fill_clrs.values())))
Plotnine does not support changing the position of any axis.
You can pass a list or a dict of colour values to scale_fill_manual provided they are recognisable colour names. The colours you have are obscure and they are not recognised. To see that it works try 'red' and 'green', see https://matplotlib.org/gallery/color/named_colors.html for all the named colors. Otherwise, you can also use hex colors e.g. #ff00cc.
Related
I'd like to style a Pandas DataFrame display with a background color that is based on the logarithm (base 10) of a value, rather than the data frame value itself. The numeric display should show the original values (along with specified numeric formatting), rather than the log of the values.
I've seen many solutions involving the apply and applymap methods, but am not really clear on how to use these, especially since I don't want to change the underlying dataframe.
Here is an example of the type of data I have. Using the "gradient" to highlight is not satisfactory, but highlighting based on the log base 10 would be really useful.
import pandas as pd
import numpy as np
E = np.array([1.26528431e-03, 2.03866202e-04, 6.64793821e-05, 1.88018687e-05,
4.80967314e-06, 1.22584958e-06, 3.09260354e-07, 7.76751705e-08])
df = pd.DataFrame(E,columns=['Error'])
df.style.format('{:.2e}'.format).background_gradient(cmap='Blues')
Since pandas 1.3.0, background_gradient now has a gmap (gradient map) argument that allows you to set the values that determine the background colors.
See the examples here (this link is to the dev docs - can be replaced once 1.3.0 is released) https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.io.formats.style.Styler.background_gradient.html#pandas.io.formats.style.Styler.background_gradient
I figured out how to use the apply function to do exactly what I want. And also, I discovered a few more features in Matplotlib's colors module, including LogNorm which normalizes using a log. So in the end, this was relatively easy.
What I learned :
Do not use background_gradient, but rather supply your own function that maps DataFrame values to colors. The argument to the function is the dataframe to be displayed. The return argument should be a dataframe with the same columns, etc, but with values replaced by colors, e.g. strings background-color:#ffaa44.
Pass this function as an argument to apply.
import pandas as
import numpy as np
from matplotlib import colors, cm
import seaborn as sns
def color_log(x):
df = x.copy()
cmap = sns.color_palette("spring",as_cmap=True).reversed()
evals = df['Error'].values
norm = colors.LogNorm(vmin=1e-10,vmax=1)
normed = norm(evals)
cstr = "background-color: {:s}".format
c = [cstr(colors.rgb2hex(x)) for x in cm.get_cmap(cmap)(normed)]
df['Error'] = c
return df
E = np.array([1.26528431e-03, 2.03866202e-04, 6.64793821e-05, 1.88018687e-05,
4.80967314e-06, 1.22584958e-06, 3.09260354e-07, 7.76751705e-08])
df = pd.DataFrame(E,columns=['Error'])
df.style.format('{:.2e}'.format).apply(color_log,axis=None)
Note (1) The second argument to the apply function is an "axis". By supplying axis=None, the entire data frame will be passed to color_log. Passing axis=0 will pass in each column of the data frame as a Series. In this case, the code supplied above will not work. However, this would be useful for dataframes in which each column should be handled separately.
Note (2) If axis=None is used, and the DataFrame has more than one column, the color mapping function passed to apply should set colors for all columns in the DataFrame. For example,
df[:,:] = 'background-color:#eeeeee'
would sets all columns to grey. Then, selective columns could be overwritten with other colors choices.
I would be happy to know if there is yet a simpler way to do this.
I am trying to build a waterfall chart using plotnine. I would like to colour the starting and ending bars as grey (ideally I want to specify hexadecimal colours), increases as green and decreases as red.
Below is some sample data and my current plot. I am trying to set fill to the pandas column colour, but the bars are all black. I have also tied putting fill in the geom_segment, but this does not work either.
df = pd.DataFrame({})
df['label'] = ('A','B','C','D','E')
df['percentile'] = (10)*5
df['value'] = (100,80,90,110,110)
df['yStart'] = (0,100,80,90,0)
df['barLabel'] = ('100','-20','+10','+20','110')
df['labelPosition'] = ('105','75','95','115','115')
df['colour'] = ('grey','red','green','green','grey')
p = (ggplot(df, aes(x=np.arange(0,5,1), xend=np.arange(0,5,1), y='yStart',yend='value',fill='colour'))
+ theme_light(6)
+ geom_segment(size=10)
+ ylab('value')
+ scale_y_continuous(breaks=np.arange(0,141,20), limits=[0,140], expand=(0,0))
)
EDIT
Based on teunbrand's comment of changing fill to color, I have the following. How do I specify the actual colour, preferably in hexadecimal format?
Just to close this question off, credit goes to teunbrand in the comments for the solution.
geom_segment() has a colour aesthetic but not a fill aesthetic. Replace fill='colour' with colour='colour'.
Plotnine will use default colours for the bars. Use scale_color_identity() if the contents of the DataFrame column are literal colours, or scale_colour_manual() to manually specify a tuple or list of colours. Both forms accept hexadecimal colours.
I'm interested in being able to recreate this multidimensional strip plot below, generated by the Missing Numbers python library, using vega-lite, and I'm looking for a few pointers on how I might do this. The code to generate the image below looks a bit like this snippet:
>>> from quilt.data.ResidentMario import missingno_data
>>> collisions = missingno_data.nyc_collision_factors()
>>> collisions = collisions.replace("nan", np.nan)
>>> import missingno as msno
>>> %matplotlib inline
>>> msno.matrix(collisions.sample(250))
For each column, there is a mark shown for a specific combination of the index, and where the data is null, or not null.
When I look through a gallery of charts created by Altair, I see this horizontal strip plot, which seems to be presenting a similar kind of information, but I'm not sure how to express the same idea.
The viz below is showing a mark when there is data that matches a given combination of horse power and cylinder size - the horsepower and cylinder are encoded into the x and y channels.
I'm not show how I'd express the same for the cool nullity matrix thing, and I think I need some pointers here.
I get that I can reset and index to come up with a y index, but it's not clear to me how to index of the sample is encoded in the Y channel, I'm not sure how I'd populate the x-axis with a column listing the null/not null results. Is this a thing I'd need to do before it gets to vega-lite, or does vega support it?
Yes, you can do this after reshaping your data with a Fold Transform. It looks something like this using Altair:
import numpy as np
import quilt
quilt.install("ResidentMario/missingno_data")
from quilt.data.ResidentMario import missingno_data
collisions = missingno_data.nyc_collision_factors()
collisions = collisions.replace("nan", np.nan)
collisions = collisions.set_index("Unnamed: 0")
import altair as alt
alt.Chart(collisions.sample(250)).transform_window(
index='row_number()'
).transform_fold(
collisions.columns.to_list()
).transform_calculate(
defined="isValid(datum.value)"
).mark_rect().encode(
x=alt.X('key:N',
title=None,
sort=collisions.columns.to_list(),
axis=alt.Axis(orient='top', labelAngle=-45)
),
y=alt.Y('index:O', title=None),
color=alt.Color('defined:N',
legend=None,
scale=alt.Scale(domain=["true", "false"], range=["black", "white"])
)
).properties(
width=800, height=400
)
This question already has answers here:
Get default line colour cycle
(4 answers)
Closed 3 years ago.
What I would like to achieve:
I want to create several pie charts on one figure. They all share some categories but sometimes have different ones. Obviously I want all of the same categories to have the same colors.
That is why I created a dictionary which links the categories (= labels) to the colors. With that I can specify the colors of the pie chart. But I would like to use the ggplot color (which come with matplotlib.style.style.use('ggplot')). How can I get those colors to feed them into my dictionary?
# set colors for labels
color_dict = {}
for i in range(0, len(data_categories)):
color_dict[data_categories[i]] = ???
# apply colors
ind_label = 0
for pie_wedge in pie[0]:
leg = ax[ind].get_legend()
pie_wedge.set_facecolor(color_dict[labels_0[ind_label]])
leg.legendHandles[ind_label].set_color_(color_dict[labels_0[ind_label]])
ind_label += 1
Short answer
To access the colors used in the ggplot style, you can do as follows
In [37]: import matplotlib.pyplot as plt
In [38]: plt.style.use('ggplot')
In [39]: colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
In [40]: print('\n'.join(color for color in colors))
#E24A33
#348ABD
#988ED5
#777777
#FBC15E
#8EBA42
#FFB5B8
In the above example the colors, as RGB strings, are contained in the list colors.
Remember to call plt.style.use(...) before accessing the color list, otherwise you'll find the standard colors.
More detailed explanation
The answer above is tailored for modern releases of Matplotlib, where the plot colors and possibly other plot properties, like line widths and dashes (see this answer of mine) are stored in the rcParams dictionary with the key 'axes.prop_cycle' and are contained in a new kind of object, a cycler (another explanation of a cycler is contained in my answer referenced above).
To get the list of colors, we have to get the cycler from rcParams and then use its .by_key() method
Signature: c.by_key()
Docstring: Values by key
This returns the transposed values of the cycler. Iterating
over a `Cycler` yields dicts with a single value for each key,
this method returns a `dict` of `list` which are the values
for the given key.
The returned value can be used to create an equivalent `Cycler`
using only `+`.
Returns
-------
transpose : dict
dict of lists of the values for each key.
to have a dictionary of values that, at last, we index using the key 'color'.
Addendum
Updated, 2023-01-01.
It is not strictly necessary to use('a_style') to access its colors, the colors are (possibly) defined in a matplotlib.RcParams object that is stored in the dictionary matplotlib.style.library.
E.g., let's print all the color sequences defined in the different styles
In [23]: for style in sorted(plt.style.library):
...: the_rc = plt.style.library[style]
...: if 'axes.prop_cycle' in the_rc:
...: colors = the_rc['axes.prop_cycle'].by_key()['color']
...: print('%25s: %s'%(style, ', '.join(color for color in colors)))
...: else:
...: print('%25s: this style does not modify colors'%style)
Solarize_Light2: #268BD2, #2AA198, #859900, #B58900, #CB4B16, #DC322F, #D33682, #6C71C4
_classic_test_patch: this style does not modify colors
_mpl-gallery: this style does not modify colors
_mpl-gallery-nogrid: this style does not modify colors
bmh: #348ABD, #A60628, #7A68A6, #467821, #D55E00, #CC79A7, #56B4E9, #009E73, #F0E442, #0072B2
classic: b, g, r, c, m, y, k
dark_background: #8dd3c7, #feffb3, #bfbbd9, #fa8174, #81b1d2, #fdb462, #b3de69, #bc82bd, #ccebc4, #ffed6f
fast: this style does not modify colors
fivethirtyeight: #008fd5, #fc4f30, #e5ae38, #6d904f, #8b8b8b, #810f7c
ggplot: #E24A33, #348ABD, #988ED5, #777777, #FBC15E, #8EBA42, #FFB5B8
grayscale: 0.00, 0.40, 0.60, 0.70
seaborn: #4C72B0, #55A868, #C44E52, #8172B2, #CCB974, #64B5CD
seaborn-bright: #003FFF, #03ED3A, #E8000B, #8A2BE2, #FFC400, #00D7FF
seaborn-colorblind: #0072B2, #009E73, #D55E00, #CC79A7, #F0E442, #56B4E9
seaborn-dark: this style does not modify colors
seaborn-dark-palette: #001C7F, #017517, #8C0900, #7600A1, #B8860B, #006374
seaborn-darkgrid: this style does not modify colors
seaborn-deep: #4C72B0, #55A868, #C44E52, #8172B2, #CCB974, #64B5CD
seaborn-muted: #4878CF, #6ACC65, #D65F5F, #B47CC7, #C4AD66, #77BEDB
seaborn-notebook: this style does not modify colors
seaborn-paper: this style does not modify colors
seaborn-pastel: #92C6FF, #97F0AA, #FF9F9A, #D0BBFF, #FFFEA3, #B0E0E6
seaborn-poster: this style does not modify colors
seaborn-talk: this style does not modify colors
seaborn-ticks: this style does not modify colors
seaborn-white: this style does not modify colors
seaborn-whitegrid: this style does not modify colors
tableau-colorblind10: #006BA4, #FF800E, #ABABAB, #595959, #5F9ED1, #C85200, #898989, #A2C8EC, #FFBC79, #CFCFCF
In my understanding
the seaborn-xxx styles that do not modify colors are to be used as the last step in a sequence of styles, e.g., plt.style.use(['seaborn', 'seaborn-poster']) or plt.style.use(['seaborn', 'seaborn-muted', 'seaborn-poster'])
also the _ starting styles are meant to modify other styles, and
the only other style,fast, that does not modify the colors is all about tweaking the rendering parameters to have a faster rendering.
I am trying to create an image to use as a test pattern for a new colormap I'm creating. The map is supposed to have nine unique colors with breaks at the integers from 0-8. The colormap itself is fine, but I can't seem to generate the image itsel.
I'm using pandas to make the test array like this:
mask=pan.DataFrame(index=np.arange(0,100),columns=np.arange(1,91))
mask.ix[:,1:10]=0.0
mask.ix[:,11:20]=1.0
mask.ix[:,21:30]=2.0
mask.ix[:,31:40]=3.0
mask.ix[:,41:50]=4.0
mask.ix[:,51:60]=5.0
mask.ix[:,61:70]=6.0
mask.ix[:,71:80]=7.0
mask.ix[:,81:90]=8.0
Maybe not the most elegant method, but it creates the array I want.
However, when I try to plot it using either imshow or pcolor I get an error. So:
fig=plt.figure()
ax=fig.add_subplot(111)
image=ax.imshow(mask)
fig.canvas.draw()
yields the error: "TypeError: Image data can not convert to float"
and substituting pcolor for imshow yields this error: "AttributeError: 'float' object has no attribute 'view'"
However, when I replace he values in mask with anything else - say random numbers - it plots just fine:
mask=pan.DataFrame(values=rand(100,90),index=np.arange(0,100),columns=np.arange(1,91))
fig=plt.figure()
ax=fig.add_subplot(111)
image=ax.imshow(mask)
fig.canvas.draw()
yields the standard colored speckle one would expect (no errors).
The problem here is that your dataframe is full of objects, not numbers. You can see it if you do mask.dtypes. If you want to use pandas dataframes, create mask by specifying the data type:
mask=pan.DataFrame(index=np.arange(0,100),columns=np.arange(1,91), dtype='float')
otherwise pandas cannot know which data type you want. After that change your code should work.
However, if you want to just test the color maps with integers, then you might be better off using simple numpy arrays:
mask = np.empty((100,90), dtype='int')
mask[:, :10] = 0
mask[:, 10:20] = 1
...
And, of course, there are shorter ways to do that filling, as well. For example:
mask[:] = np.arange(90)[None,:] / 10