Python: Plt bar plot - different colors - python

In Python, how can I make the 'reported' bars green, and 'UNREPORTED' bars red?
I want to give different color to each of the reported and UNREPORTED bars in my graph.
new = (('AXIN', 37, 'reported'),
('LGR', 30, 'UNREPORTED'),
('NKD', 24, 'reported'),
('TNFRSF', 23, 'reported'),
('CCND', 19, 'reported'),
('APCDD', 18, 'reported'),
('TRD', 16, 'reported'),
('TOX', 15, 'UNREPORTED'),
('LEF', 15, 'reported'),
('MME', 13, 'reported'))
#sort them as most common gene comes first
new = sorted(new, key=lambda score: score[1], reverse=True)
#X, Y zip of the tuple new are for plt.bar
X, Y, _ = zip(*new)
import seaborn as sns
sns.set()
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize = (20, 10))
mytitle = "Most common genes coexpressed with {gene1}, {gene2}, {gene3}, {gene4}".format(
gene1="Axin2", gene2="Lef", gene3="Nkd1", gene4="Lgr5")
plt.title(mytitle, fontsize=40)
plt.ylabel('Number of same gene encounters across studies', fontsize=20)
ax = plt.bar(range(len(X)), Y, 0.6, tick_label = X, color="green")
ax = plt.xticks(rotation=90)
new = tuple(new)

You can iterate over the bars and check if for the given index, the report is 'UNREPORTED'. If this is the case, colorize the bar using set_color.
import seaborn as sns
import matplotlib.pyplot as plt
new = (('AXIN', 37, 'reported'),
('LGR', 30, 'UNREPORTED'),
('NKD', 24, 'reported'),
('TNFRSF', 23, 'reported'),
('CCND', 19, 'reported'),
('APCDD', 18, 'reported'),
('TRD', 16, 'reported'),
('TOX', 15, 'UNREPORTED'),
('LEF', 15, 'reported'),
('MME', 13, 'reported'))
#sort them as most common gene comes first
new = sorted(new, key=lambda score: score[1], reverse=True)
#X, Y zip of the tuple new are for plt.bar
X, Y, rep = zip(*new)
plt.figure(figsize = (8, 6))
mytitle = "Most common genes coexpressed with {gene1}, {gene2}, {gene3}, {gene4}".format(
gene1="Axin2", gene2="Lef", gene3="Nkd1", gene4="Lgr5")
plt.title(mytitle)
plt.ylabel('Number of same gene encounters across studies')
bars = plt.bar(range(len(X)), Y, 0.6, tick_label = X, color="green")
plt.xticks(rotation=90)
for i, bar in enumerate(bars):
if rep[i] == 'UNREPORTED':
bar.set_color("red")
plt.show()

You need to pass a list or tuple of colors instead of just 1 color to plt.bar. You can do so by creating a color dictionary, then building the list of color.
new = sorted(new, key=lambda score: score[1], reverse=True)
# save the reporting type as R
X, Y, R = zip(*new)
# create color dictionary
color_dict = {'reported':'green', 'UNREPORTED':'red'}
plt.figure(figsize = (20, 10))
mytitle = "Most common genes coexpressed with {gene1}, {gene2}, {gene3}, {gene4}".format(
gene1="Axin2", gene2="Lef", gene3="Nkd1", gene4="Lgr5")
plt.title(mytitle, fontsize=40)
plt.ylabel('Number of same gene encounters across studies', fontsize=20)
# build the colors from the color dictionary
ax = plt.bar(range(len(X)), Y, 0.6, tick_label = X, color=[color_dict[r] for r in R])

Related

how to plot class labels using a distribution list

I have a dataset with train and test sets and three classes A,B,and C. I want to create a plot in which I show the distribution of data labels in each class for TRAIN and TEST sets separately (these are binary class labels 0 and 1). Ideally, I would like to show TRAIN and TEST stats in different colours, maybe in a bar chart. These are the values:
a_train = [40,75]
a_test = [10,19]
b_train=[41,75]
b_test=[10,19]
c_train=[51,75]
c_test=[12,19]
I have tried to use pyplot but was confused how to create the plot:
import numpy as np
import matplotlib.pyplot as plt
top=[(['A',[[40,75],[10,19]]]),('B',[[41,75],[10,19]]),('C',[[51,75],[12,19]])]
labels, ys = zip(*top)
xs = np.arange(len(labels))
width = 1
plt.bar(xs, ys, width, align='center')
plt.xticks(xs, labels)
plt.yticks(ys)
which gives this error:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
labels = ['a_train', 'a_test', 'b_train', 'b_test','c_train','c_test']
Positive = [40, 10, 41, 10, 51, 12]
Negative = [75, 19, 75, 19, 75, 19]
x = np.arange(len(labels))
width = 0.30 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, Positive, width, label='Positive')
rects2 = ax.bar(x + width/2, Negative, width, label='Negative')
ax.set_ylabel('Values')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.show()
Result

Output Values from Regression Line inside Matplotlib window

import matplotlib.pyplot as plt
import numpy as np
x = np.array([6, 15, 24, 33, 41, 52, 59, 66, 73, 81])
y = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])
coef = np.polyfit(x, y, 1)
poly1d_fn = np.poly1d(coef) # to create a linear function with coefficients
plt.plot(x, y, 'ro', x, poly1d_fn(x), '-b')
plt.errorbar(x, poly1d_fn(x), yerr=poly1d_fn(x) - y, fmt='.k')
plt.show()
I have a working code which produces based upon my input a graph with error bars and the regression line. That's all fine. Now what I wanted to do is add a text box below and once a user inputs a number, e.g. 12 it outputs the according value (re regression line).
left, bottom, width, height = 0.15, 0.02, 0.7, 0.10
plt.subplots_adjust(left=left, bottom=0.25) # Make space for the slider
input_field = plt.axes([left, bottom, width, height])
box = TextBox(input_field, 'value')
I tried it with this approach. Though being unsuccessful: I can't get it to take a value and output it on the GUI interface matplotlib provides. The field would need to be checked for every input. Matplotlib offers on_text_change(self, func)or on_submit(self, func), so that might be working - but how to output?
Does anyone have an idea?
I would use a simple Text artist to display the result. But being fancy, I would also display lines on the graph showing the input and output values.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([6, 15, 24, 33, 41, 52, 59, 66, 73, 81])
y = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])
coef = np.polyfit(x, y, 1)
poly1d_fn = np.poly1d(coef) # to create a linear function with coefficients
def submit(val):
try:
x = float(val)
y = poly1d_fn(x)
ax.annotate('', xy=(x,0), xycoords=('data','axes fraction'),
xytext=(x,y), textcoords='data',
arrowprops=dict(arrowstyle='-', ls='--'))
ax.annotate(f'{x:.2f}', xy=(x,0), xycoords=('data','axes fraction'))
ax.annotate('', xy=(0,y), xycoords=('axes fraction','data'),
xytext=(x,y), textcoords='data',
arrowprops=dict(arrowstyle='-', ls='--'))
ax.annotate(f'{y:.2f}', xy=(0,y), xycoords=('axes fraction','data'))
output_box.set_text(f'Result = {y:.2f}')
plt.draw()
except ValueError:
pass
fig, ax = plt.subplots()
ax.plot(x, y, 'ro', x, poly1d_fn(x), '-b')
ax.errorbar(x, poly1d_fn(x), yerr=poly1d_fn(x) - y, fmt='.k')
left, bottom, width, height, pad = 0.15, 0.02, 0.3, 0.10, 0.1
fig.subplots_adjust(left=left, bottom=0.25) # Make space for the slider
input_field = fig.add_axes([left, bottom, width, height])
text_box = matplotlib.widgets.TextBox(input_field, 'value')
text_box.on_submit(submit)
output_box = fig.text(left+width+pad, bottom+height/2, s='Result = ', va='center')

Create stacked bar with matplotlib

I have data displayed in the following format:
values = np.array([10, 12,13, 5,20], [30, 7, 10, 25,2], [10, 12,13, 5,20]])
And I want to create a straight-up stacked bar chart like the following figure. Each element in the array belongs to a stacked bar.
I have searched to see how can I do this with matplotlib, but unfortunately, I still haven't found a way to do it. How can I do this?
AFAIK, there is now straightforward way to do it. You need to calculate exact position of bars yourself and then normalize it.
import numpy as np
import matplotlib.pyplot as plt
values = np.array([[10, 12,13, 5,20], [30, 7, 10, 25,2], [10, 12,13, 5,20]])
values_normalized = values/np.sum(values, axis=0)
bottom_values = np.cumsum(values_normalized, axis=0)
bottom_values = np.vstack([np.zeros(values_normalized[0].size), bottom_values])
text_positions = (bottom_values[1:] + bottom_values[:-1])/2
r = [0, 1, 2, 3, 4] # position of the bars on the x-axis
names = ['A', 'B', 'C', 'D', 'E'] # names of groups
colors = ['lightblue', 'orange', 'lightgreen']
for i in range(3):
plt.bar(r, values_normalized[i], bottom=bottom_values[i], color=colors[i], edgecolor='white', width=1, tick_label=['a','b','c','d','e'])
for xpos, ypos, yval in zip(r, text_positions[i], values[i]):
plt.text(xpos, ypos, "N=%d"%yval, ha="center", va="center")
# Custom X axis
plt.xticks(r, names, fontweight='bold')
plt.xlabel("group")
plt.show()
There is a source that tells how to add text on top of bars. I'm a bit in a hurry right now so I hope this is useful and I'll update my answer next day if needed.
I've updated my answer. Adding text on top of the bars is tricky, it requires some calculations of their vertical positions.
Btw, I have refactored the most of code that is in a link I shared.
Python 3.8
matplotlib 3.3.1
numpy 1.19.1
Chat Result
import matplotlib.pyplot as plt
import numpy as np
values = np.array([[10, 12, 13, 5, 20], [30, 7, 10, 25, 2], [10, 12, 13, 5, 20]])
row, column = values.shape # (3, 5)
x_type = [x+1 for x in range(column)]
ind = [x for x, _ in enumerate(x_type)]
values_normalized = values/np.sum(values, axis=0)
value1, value2, value3 = values_normalized[0,:], values_normalized[1,:], values_normalized[2,:]
# Create figure
plt.figure(figsize=(8, 6))
plt.bar(ind, value1, width=0.8, label='Searies1', color='#5B9BD5')
plt.bar(ind, value2, width=0.8, label='Searies2', color='#C00000', bottom=value1)
plt.bar(ind, value3, width=0.8, label='Searies3', color='#70AD47', bottom=value1 + value2)
# Show text
bottom_values = np.cumsum(values_normalized, axis=0)
bottom_values = np.vstack([np.zeros(values_normalized[0].size), bottom_values])
text_positions = (bottom_values[1:] + bottom_values[:-1])/2
c = list(range(column))
for i in range(3):
for xpos, ypos, yval in zip(c, text_positions[i], values[i]):
plt.text(xpos, ypos, yval, horizontalalignment='center', verticalalignment='center', color='white')
plt.xticks(ind, x_type)
plt.legend(loc='center', bbox_to_anchor=(0, 1.02, 1, 0.1), handlelength=1, handleheight=1, ncol=row)
plt.title('CHART TITLE', fontdict = {'fontsize': 16,'fontweight': 'bold', 'family': 'serif'}, y=1.1)
# Hide y-axis
plt.gca().axes.yaxis.set_visible(False)
plt.show()

Ylabel rescale range and end at 0%

import numpy as np
import matplotlib.pyplot as plt
n = 1000
x = np.arange(0, n)
y1 = np.random.normal(50, 4, n)
y2 = np.random.normal(25, 2.5, n)
y3 = np.random.normal(10, 1.1, n)
fig, (ax1, ax2, ax3) = plt.subplots(nrows = 3, ncols = 1)
ax1.plot(x, y1, 'royalblue')
ax1.set(xticks = [], title = 'Title')
ax2.plot(x, y2, 'darkorange')
ax2.set(xticks = [])
ax3.plot(x, y3, 'forestgreen')
ax3.set(xlabel = 'Random sample')
fig.legend(['First', 'Second', 'Third'])
plt.show()
I would like the ylabels to be shown in percentage, start at 0% and decrease. For example the blue one should go from [30, 40, 50, 60, 70] to [-57.1%, -42.9%, -28.6%, -14.3%, 0%]. The yellow one should go from [10, 20, 30, 40] to [-75%, -50%, -25%, 0%] and the green one should go from [5, 7.5, 10, 12.5, 15] to [-66.6%, -50%, -33.3%, -16.7%, 0%].
The rest of the graphs should look exactly the same, only the ylabels should change.
Just convert your current yticks to floats and change to the range you want them to be at before displaying:
import numpy as np
ticks = [float(x) for x in yvals]
ticks = np.array(ticks) - max(ticks)
yticklabels = ['{0:.1%}'.format(x) for x in ticks]
Do this for each plot separately.

LineCollection not displaying data for some axes on multiple axes plot

The following code will create a plot that appears to have invisible data on several if the subplots.
Why do I then need to include ax[i].autoscale_view(True,True,True)?
Why does print ax[i].lines show [] ?
Code
from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
import numpy as np
# example data with properties:
# len(lines) == 4 and len(lines[0]) == 10
# len(x) == 10
lines = [(1.2310957583605482, 1.283772297331087, 1.61856069891319, 2.1602226857314735, 1.0277068564151643, 1.1715166081037471, 1.463648931121718, 1.2329321041327499, 1.4080120164965291, 1.2225064185740224), (0.33323810593968223, 0.32582779060567746, 0.32836534361310366, 0.51831090602571572, 0.29791484909192673, 0.35713207695246518, 0.29463171650130665, 0.34633265872428215, 0.39298012050485071, 0.410877623134692), (10, 11, 13, 17, 8, 10, 12, 10, 11, 10), (0.9911659269366481, 0.989291500800633, 0.9880005820749531, 0.9820511801663299, 0.978444258093041, 0.9737543029212308, 0.9711834357704919, 0.9632772617693266, 0.95740331184712, 0.9523058427743931)]
x = [0.0, 0.00101010101010101, 0.00202020202020202, 0.0030303030303030303, 0.00404040404040404, 0.00505050505050505, 0.006060606060606061, 0.007070707070707071, 0.00808080808080808, 0.00909090909090909]
n=len(lines) # copy in lines and x data from below
fig, ax = plt.subplots(n, sharex=True, figsize = (8, 8))
for i, y in enumerate(lines):
xy = zip(x, y)
lc = LineCollection([xy], linewidth = 2)
ax[i].add_collection(lc)
plt.show()

Categories

Resources