Map boolean values to strings - python

I am plotting a graph where my x variable is 'Mg' and my y variable is 'Si'. I have a third variable called 'binary'. If binary is equal to 0 or 1, how do I colour the plotted point in red or black respectively?
I need to use the functions plt.scatter and colourbar(). I've read about colourbar but it seems to generate a continuous spectrum of colour. I've tried using plt.colors.from_levels_and_colors instead but I'm not really sure how to use it properly.
levels = [0,1]
colors = ['r','b']
cmap, norm = plt.colors.from_levels_and_colors(levels, colors)
plt.scatter(data_train['Mg'], data_train['Si'], c = data_train['binary'])
plt.show()
Also, in the future, instead of asking a question like this in this forum what can I do to solve the problem on my own? I try to read the documentation online first but often find it hard to understand.

np.where makes encoding binary values easy.
np.where([1, 0, 0, 1], 'yes', 'no')
# array(['yes', 'no', 'no', 'yes'], dtype='<U3')
colors = np.where(data_train['binary'], 'black', 'red')
plt.scatter(data_train['Mg'], data_train['Si'], c=colors)

If you're working with multiple "quantitive" colors, not with colormap, you probably should change your c from binary to mpl-friedly format. I.e.
point_colors = [colors[binary] for binary in data_train['binary']]
plt.scatter(data_train['Mg'], data_train['Si'], c=point_colors)

Related

Customizable heat map for strings

I am writing to you since I did not find a satisfactory answer to my question. Specifically I have a pandas data frame containing string characters for each variable. It is made as follows:
Own AUSTRIA. Own BELGIUM.
"-1.3" "-0.34"
"-0.43" "-1.89**"
"-1.2**" "-4.5"
"-1.9" "-2.3"
"-2**" "-6.1**"
"-.7" "-0.3"
"-0.06" "-7.2**"
... ...
"-1.1**" "-10.34"
What my goal is, is to produce an heatmap where terms having the "**" charachter are coloured in red, while others in blue (other color are fine). I know that heatmaps are based on the values' input but if I try to rescale the values with "**" either also other values are taken for I have too set too high (low) values soo that heat map understands which values need to be coloured.
Thank you,
Federico
In a 'heatmap dataframe', use large values for red, low values for blue.
heatmap = df.copy()
heatmap = heatmap.apply(lambda x: 255 if '**' in x else 0)
You can of course use any range you want, like [0, 1], [0, 255], [-1, 1].

Plotnine's scale fill and axis position

I would like to move the x-axis to the top of my plot and manually fill the colors. However, the usual method in ggplot does not work in plotnine. When I provide the position='top' in my scale_x_continuous() I receive the warning: PlotnineWarning: scale_x_continuous could not recognize parameter 'position'. I understand position is not in plotnine's scale_x_continuous, but what is the replacement? Also, scale_fill_manual() results in an Invalid RGBA argument: 'color' error. Specifically, the value requires an array-like object. Thus I provided the array of colors, but still had an issue. How do I manually set the colors for a scale_fill object?
import pandas as pd
from plotnine import *
lst = [[1,1,'a'],[2,2,'a'],[3,3,'a'],[4,4,'b'],[5,5,'b']]
df = pd.DataFrame(lst, columns =['xx', 'yy','lbls'])
fill_clrs = {'a': 'goldenrod1',
'b': 'darkslategray3'}
ggplot()+\
geom_tile(aes(x='xx', y='yy', fill = 'lbls'), df) +\
geom_text(aes(x='xx', y='yy', label='lbls'),df, color='white')+\
scale_x_continuous(expand=(0,0), position = "top")+\
scale_fill_manual(values = np.array(list(fill_clrs.values())))
Plotnine does not support changing the position of any axis.
You can pass a list or a dict of colour values to scale_fill_manual provided they are recognisable colour names. The colours you have are obscure and they are not recognised. To see that it works try 'red' and 'green', see https://matplotlib.org/gallery/color/named_colors.html for all the named colors. Otherwise, you can also use hex colors e.g. #ff00cc.

Visualize multiple 2d Array with same color scheme

I am currently trying to visualize three 2D arrays with the same color. The arrays are 13x13 and contain integers. In an external file I have a color code in hex for each integer.
When I now try to visualize the arrays, two out of three arrays look good. All numbers match the color codes and display the arrays correctly. But in the last picture a part of the data is not assigned correctly.
.
color_names = [c.strip() for c in open(colors).readlines()]
color_dict = {v: k for v, k in enumerate(color_names)}
unique_classes = (np.unique(np.asarray(feature_map))).tolist()
number_classes = len(unique_classes)
color_code = [color_dict.get(cla) for cla in unique_classes]
cmap = plt.colors.ListedColormap(color_code)
norm = plt.colors.BoundaryNorm(unique_classes, cmap.N)
img = pyplot.imshow(feature_map[0],interpolation='nearest',
cmap = cmap,norm=norm)
pyplot.colorbar(img,cmap=cmap,
norm=norm,boundaries=unique_classes)
pyplot.show()
img1 = pyplot.imshow(feature_map[1],interpolation='nearest',
cmap = cmap,norm=norm)
pyplot.show()
img2 = pyplot.imshow(feature_map[2],interpolation='nearest',
cmap = cmap,norm=norm)
pyplot.colorbar(img2,cmap=cmap,
norm=norm,boundaries=unique_classes)
pyplot.show()
Exactly the same data as on the picture:
feature_map = [[[25,25,25,25,56,56,2,2,2,2,2,2,25],[25,25,25,25,25,25,59,7,72,72,72,72,2],[25,25,25,25,25,25,59,72,72,72,72,72,2],[25,25,25,24,24,24,62,0,0,0,0,25,25],[25,25,24,24,24,24,24,24,24,24,25,25,25],[26,26,24,24,24,24,24,26,26,26,6,6,6],[26,26,26,24,24,26,26,26,26,26,26,6,6],[26,26,26,0,0,26,26,26,26,26,26,6,6],[28,28,28,28,28,28,28,26,26,26,26,6,6],[28,28,28,28,28,28,28,26,26,26,13,13,6],[28,28,28,28,28,28,28,26,13,13,13,13,13],[28,28,28,28,28,28,28,13,13,13,13,13,13],[28,28,28,28,28,28,28,13,13,13,13,13,13]],[[25,25,25,25,59,56,59,2,0,0,0,0,0],[25,25,25,25,25,59,59,7,72,72,72,72,72],[25,25,25,25,25,25,59,72,72,72,72,72,72],[25,25,25,0,0,25,25,6,0,0,0,72,0],[25,25,0,0,0,0,6,0,0,0,0,25,6],[26,26,26,0,0,0,24,26,0,0,6,6,6],[26,26,26,0,0,0,26,26,26,26,26,6,6],[0,26,0,0,0,0,26,26,0,26,26,6,6],[0,28,28,28,28,28,28,26,0,26,26,6,6],[28,28,28,28,28,28,28,26,0,26,0,0,0],[28,28,28,28,28,28,28,26,13,13,13,13,0],[56,56,28,28,28,28,28,13,13,13,13,13,13]],[[0,28,28,28,28,28,28,13,13,13,13,13,0],[25,25,25,25,59,59,59,4,0,0,0,0,0],[25,25,25,25,59,59,59,7,7,7,72,72,6],[25,25,25,25,25,25,59,7,7,73,73,25,0],[25,25,25,0,0,25,6,7,0,6,6,6,0],[25,0,0,0,6,6,6,6,0,0,6,6,6],[0,0,0,0,0,6,6,6,0,0,6,6,6],[0,0,0,0,0,0,6,6,0,0,6,6,6],[0,0,0,0,0,0,6,0,0,0,6,6,6],[0,0,28,0,28,28,13,0,0,0,6,6,6],[28,28,28,28,28,28,13,13,13,0,13,6,6],[28,28,28,28,28,28,28,13,13,13,13,13,13],[56,28,28,28,28,28,28,13,13,13,13,13,13],[28,28,28,28,28,28,28,13,13,13,13,13,13]]]
The color code file is simply a file where each line contains a single hex code such as: #deb887
I have been working on this problem for several hours and can't reproduce the problem at the moment
I have tried to reproduce your results and something got my attention.
If you look closely to the feature_map[2] values you might see that the pixel you claim miss classified has actually a different value than the pixels around it. So it actually has the correct color for its value. So I think it is not because of a misclassification it is beacause of your data. That would be my answer IF what you mean by "part of the data" is the pixel at position (0,11) otherwise i have gotten it all wrong and sorry about this answer.
NOTE: About colors, I just picked some random colors. Don't worry if they don't match.

How do the x and y parameters in the Label object work for bokeh?

I've read the documentation for the Label class in Bokeh but the x and y parameters are quite confusing. Their behavior seems to change if you pass something to the x_units and y_units parameters but I don't understand what the units are supposed to be by default.
More specifically, I have a list of strings that I'm using for my x-axis:
xlab = [
'COREPCE2',
'COREPCE3',
'COREPCE4',
'COREPCE5',
'COREPCE6',
'',
'T5YIE'
]
p = figure(..., y_range = (0,.04), x_range = xlab)
If I wanted to draw basically anything else on the plot, I could just use those strings. For example I drew some lines like this:
p.line(['COREPCE2', 'T5YIE'], [.02,.02], color = 'black', line_dash = 'dashed')
p.line(['', ''], [0,.04], color = 'black')
And that works fine, this is the full chart.
Here's the issue though. I want to put a text label on the "COREPCE4" location of the x axis. If I try just passing the string for the x parameter in the Label class it just doesn't work:
section = Label(x = 'COREPCE4', y = .03, text = 'Survey of Professional Forecasters: August 9, 2019')
p.add_layout(section)
It throws an error: ValueError: expected a value of type Real, got COREPCE4 of type str. I don't really know what units its expecting. Is there a way to make Bokeh recognize that I want to use the x-axis label as my x parameter in the same way I've done with the other glyphs?
The propertied x_units, y_units, refer to screen (pixel) vs data-space (axis) units. As of Bokeh 1.3.4 the x and y properties of Label can only be set from floating point numbers, so they cannot be used directly with categorical coordinates. For now you should use LabelSet, even if you are only showing a single label, since it can work with categorical coordinates.

Pyplot set tick frequency and tick labels

I'm trying to make a plot with matplotlib where I want to specify both the position of the tick marks, and the text of the tick marks. I can individually do both with yticks(np.arange(0,1.1,1/16.)) and gca().set_yticklabels(['1','2','3']). However, for some reason when I do both of them together, the labels do not appear on the graph. Is there a reason for this? How can I get around it? Below is a working example of what I want to accomplish.
x = [-1, -0.2, -0.15, 0.15, 0.2, 7.8, 7.85, 8.15, 8.2, 12]
y = [1, 1, 15/16., 15/16., 1, 1, 15/16., 15/16., 1, 1]
figure(1)
plot(x,y)
xlabel('Time (years)')
ylabel('Brightness')
yticks(np.arange(0,1.1,1/16.))
xticks(np.arange(0,13,2))
ylim(12/16.,16.5/16.)
xlim(-1,12)
gca().set_yticklabels(['12/16', '13/16', '14/16', '15/16', '16/16'])
show(block = False)
Effectively I just wanted to replace the numerical values with fractions, but when I run this, the labels do not appear. It seems that using both yticks() and set_yticklabels together is a problem because if I remove either line, the remaining line works as it should.
If anyone can indicate how to simply force the label to be a fraction, that would also solve my problem.
EDIT:
I found an ugly workaround by using
ylim(12/16., 16.5/16)
gca().yaxis.set_major_locator(FixedLocator([12/16., 13/16., 14/16., 15/16., 16/16.]))
gca().yaxis.set_major_formatter(FixedFormatter(['12/16', '13/16', '14/16', '15/16', '16/16']))
While this may work for this specific example, it does not generalize well and it is cumbersome to specify the exact location and label of every tick mark. If anyone finds another solution, I'm all ears.
1) Your arange should produce 5 ticks, the same as labels you set.
arange is not good for that. It is better to use linspace.
2) You can set ticks and labels with the same function
plot(x,y)
xlabel('Time (years)')
ylabel('Brightness')
yticks(np.linspace(12/16., 1, 5), ('12/16', '13/16', '14/16', '15/16', '16/16') )
xticks(np.arange(0,13,2))
ylim(12/16.,16.5/16.)
xlim(-1,12)
3) Note that you should adjust the actual values of the axis with the position of the labels using linspace(12/16., 1, 5) instead of arange(0, 1.1, 1/16.))

Categories

Resources