Avoid duplicate labels in matplotlib with x, y of form [[...], [...] ...]

Avoid duplicate labels in matplotlib with x, y of form [[...], [...] ...] - python

Firstly, let me post a link to a similar post but with a slight difference.
I am having trouble to create legend with unique labels with input data of form:
idl_t, idl_q = [[0, 12, 20], [8, 14, 24]], [[90, 60, 90], [90, 60, 90]]
and plotting is as following:
plt.plot(idl_t, idl_q, label="Some label")
The results is that I have multiple labels of the same text. The link posted before was having similar problems the OP there was using data of format:
idl_t, idl_q = [1,2], [2,3]
which is different from my case and I am not sure if the logic there can be applied to my case
So the question is how do I avoid duplicate labels without changing data input?

You can get the handles and labels used to make the legend and modify them. In the code below, these labels/handles are made into a dictionary which keeps unique dictionary keys (associated with your labels here), leading to loosing duplicate labels. (You may want to manipulate them differently to achieve your goal.)
import matplotlib.pyplot as plt
idl_t, idl_q = [[0, 12, 20], [8, 14, 24]], [[90, 60, 90], [90, 60, 90]]
plt.plot(idl_t, idl_q, label="Some label")
# get legend handles and their corresponding labels
handles, labels = plt.gca().get_legend_handles_labels()
# zip labels as keys and handles as values into a dictionary, ...
# so only unique labels would be stored
dict_of_labels = dict(zip(labels, handles))
# use unique labels (dict_of_labels.keys()) to generate your legend
plt.legend(dict_of_labels.values(), dict_of_labels.keys())
plt.show()

It can be done in a single line, but I am afraid it does not seem to read that well.
fig, ax = plt.subplots()
for i, (x, y) in enumerate(zip(zip(*idl_t), zip(*idl_q))):
ax.plot(x, y, label="Label" if i == 0 else "_nolabel_")
Maybe something like this is simpler to understand:
for i in range(len(idl_t)):
ax.plot(
[xval[i] for xval in idl_t],
[yval[i] for yval in idl_q],
label="Label" if i == 0 else "_nolabel_"
)
ax.legend()
In case you want to plot [0, 12, 20] with [90, 60, 90], and [8, 14, 24] with [90, 60, 90]:
for i, (x, y) in enumerate(zip(idl_t, idl_q)):
ax.plot(x, y, label="Label" if i == 0 else "_nolabel_")
ax.legend()
plt.show()

Related

Creating a phylogenetic tree with domain annotations using BioPython

I want to create a figure like so:
Example of figure I would like to create
Here is some dummy data and attempt so far to go about this:
import io
import matplotlib.pyplot as plt
from Bio import Phylo
# input data
treedata = "(A, (B, C))"
handle = io.StringIO(treedata)
tree = Phylo.read(handle, "newick")
# domains = [[speciesreference, full length of protein sequence, [domain reference code, start position, end position], [speciesreference, full length of protein sequence, [domain reference code, start position, end position]]
domains = [['A', 150, ['IPR000001', 10, 15], ['IPR000002', 20, 40], ['IPR000003', 70, 130]],
['B', 300, ['IPR000001', 70, 150], ['IPR000002', 29, 40], ['IPR000003', 100, 200]],
['C', 100, ['IPR000001', 5, 15], ['IPR000002', 25, 30], ['IPR000003', 27, 90]]]
# create figure and subplots
fig = plt.figure(figsize=(6, 6), dpi=300)
ax1 = fig.add_subplot(1, 2, 1) # left axis
ax2 = fig.add_subplot(1, 2, 2, sharey=ax1) # right axis
# draw dendrogram to axis 1
fig = Phylo.draw(tree, axes=ax1)
# draw rest to axis 2
# ...
# show figure
plt.show()
I have been advised to use the matplotlib bar function to plot the domains. How would I go about doing this?
P.s. If there is a much easier way of doing this in another language I am open to it, but I would prefer to do this programatically if possible.

You could use ETE3 to implement this as well - it can load the tree as a newick, and then you can set it up with the motifs - from how I understand the documentation you'll have to have a list of lists for each organism, like so:
motifs = [[start_of_motif, end_of_motif, motif_shape, motif_width, motif_height,
foreground_color, font|size|color|label_text],
[start_of_motif2, end_of_motif2, motif2_shape, motif2_width, motif2_height,
foreground_color, font|size|color|label2_text]]
and so on.
So for example you could have this as
motifs_a = [[10, 15, "[]", None, 10, "green", "arial|12|black|IPR000001"],
[20, 40, "[]", None, 10, "yellow", "arial|12|black|IPR000002"],
[70, 130, "[]", None, 10, "red", "arial|12|black|IPR000003"]]
for your first organism, where [] for the shape means it'll be a rectangle.
You then attach it to the relevant organism. Going off ETE3's documentation, that would be:
from ete3 import Tree, SeqMotifFace, add_face_to_node
tree_with_domains = Tree("(A, (B, C))") # or Tree("path/to/newick.nwk")
protein_seq_a = "<your sequence here>"
motifs_a = [[10, 15, "[]", None, 10, "green", "arial|12|black|IPR000001"],
[20, 40, "[]", None, 10, "yellow", "arial|12|black|IPR000002"],
[70, 130, "[]", None, 10, "red", "arial|12|black|IPR000003"]]
organism_a_motif_face = SeqMotifFace(protein_seq_a, motifs=motifs_a)
(tree_with_domains & "A").add_face(organism_a_motif_face, 0, "aligned")
If you don't have the sequence, you can also pass seq=None to SeqMotifFace.

How to update y-axis in matplotlib

I have problem update limits on y-axis.
My idea is to read some csv file, and to plot some graphs.
When I set limits for y-axis, it doesn't show on the plot.
It always shows, values from file.
I'm new in python.
import matplotlib.pyplot as plt
import csv
import numpy as np
x = []
y = []
chamber_temperature = []
with open(r"C:\Users\mm02058\Documents\test.txt", 'r') as file:
reader = csv.reader(file, delimiter = '\t')
for row in (reader):
x.append(row[0])
chamber_temperature.append(row[1])
y.append(row[10])
x.pop(0)
y.pop(0)
chamber_temperature.pop(0)
#print(chamber_temperature)
arr = np.array(chamber_temperature)
n_lines = len(arr)
time = np.arange(0,n_lines,1)
time_sec = time * 30
time_min = time_sec / 60
time_hour = time_min / 60
time_day = time_hour / 24
Fig_1 = plt.figure(figsize=(10,8), dpi=100)
plt.suptitle("Powered Thermal Cycle", fontsize=14, x=0.56, y= 0.91)
plt.subplot(311, xlim=(0, 30), ylim=(-45,90), xticks=(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30), yticks=( -40, -30, -20, -10, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90), ylabel=("Temperature [°C]"))
plt.plot(time_hour, chamber_temperature, 'k', label='Temperature')
plt.gca().invert_yaxis()
plt.grid()
plt.legend(shadow=True, fontsize=('small'), loc = 'center right', bbox_to_anchor=(1.13, 0.5))
plt.show()

Your code looks suspicious, because I cannot see a conversion from strings (what csv.reader produces) to floating point numbers.
Also your plot look suspicious, because the y tick labels are not sorted!
I decided to check if, by chance, Matplotlib tries to be smarter than it should...
import numpy as np
import matplotlib.pyplot as plt
# let's plot an array of strings, as I suppose you did,
# and see if Matplotlib doesn't like it, or ...
np.random.seed(20210719)
arr_of_floats = 80+10*np.random.rand(10)
arr_of_strings = np.array(["x = %6.3f"%round(x, 2) for x in arr_of_floats])
plt.plot(range(10), arr_of_strings)
plt.show()
Now, let's see what happens if we perform the conversion to floats
# for you it's simply: array(chamber_temperature, dtype=float)
arr_of_floats = np.array([s[4:] for s in arr_of_strings], dtype=float)
plt.plot(range(10), arr_of_floats)
plt.show()
Eventually, do not change axes' limits (etc etc) BEFORE plotting, but:
first, possibly organize your figure (figure size, subplots, etc)
second, plot your data,
third, adjust the details of the graph and
fourth and last, commit your work using plt.show().

Use
plt.ylim([bottom limit, top limit]) #like plt.ylim(84,86)
before your
plt.show()
that should work!
You are setting your x and y lims, as you have the equal sign.
You need to call them like a function (no equal sign).

Matplotlib how to dotplot variable number of points over time?

I'm trying to build an audiofingerprint algorithm like Shazam.
I have a variable length array of frequency point data like so:
[[69, 90, 172],
[6, 18, 24],
[6, 18],
[6, 18, 24, 42],
[]
...
I would like to dotplot it like a spectrogram sort of like this. My data doesn't explicitly have a time series axes but each row is a 0.1s slice of time. I am aware of plt.specgram.

np.repeat can create an accompanying array of x's. It needs an array of sizes to be calculated from the input values.
Here is an example supposing the x's are .1 apart (like in the post's description, but unlike the example image).
import numpy as np
import matplotlib.pyplot as plt
# ys = [[69, 90, 172], [6, 18, 24], [6, 18], [6, 18, 24, 42]]
ys = [np.random.randint(50, 3500, np.random.randint(2, 6)) for _ in range(30)]
sizes = [len(y) for y in ys]
xs = [np.repeat(np.arange(.1, (len(ys) + .99) / 10, .1), sizes)]
plt.scatter(xs, np.concatenate(ys), marker='x', color='blueviolet')
plt.show()

What does indexing the matplotlib axis do in a loop?

I saw a post on assigning the same colors across multiple pie plots in Matplotlib here
But there's something I don't understand about indexing the axis object.
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
def mypie(slices,labels,colors):
colordict={}
for l,c in zip(labels,colors):
print l,c
colordict[l]=c
fig = plt.figure(figsize=[10, 10])
ax = fig.add_subplot(111)
pie_wedge_collection = ax.pie(slices, labels=labels, labeldistance=1.05)#, autopct=make_autopct(slices))
for pie_wedge in pie_wedge_collection[0]:
pie_wedge.set_edgecolor('white')
pie_wedge.set_facecolor(colordict[pie_wedge.get_label()])
titlestring = 'Issues'
ax.set_title(titlestring)
return fig,ax,pie_wedge_collection
slices = [37, 39, 39, 38, 62, 21, 15, 9, 6, 7, 6, 5, 4, 3]
cmap = plt.cm.prism
colors = cmap(np.linspace(0., 1., len(slices)))
labels = [u'TI', u'Con', u'FR', u'TraI', u'Bug', u'Data', u'Int', u'KB', u'Other', u'Dep', u'PW', u'Uns', u'Perf', u'Dep']
fig,ax,pie_wedge_collection = mypie(slices,labels,colors)
plt.show()
In the line: for pie_wedge in pie_wedge_collection[0] what does the index [0] do? The code doesn't work if I don't use it or use pie_wedge_collection[1]
Doesn't the ax object here only have one plot here? So I don't understand what the index is doing.

According to the Matplotlib documentation, pie() returns two or three lists:
A list of matplotlib.patches.Wedge
A list of matplotlib.text.Text labels
(conditionally) A list of matplotlib.text.Text data labels
Your code needs to manipulate the edge and face colors of the Wedge objects returned by pie(), which are in the first list (zero index) in the return value, pie_wedge_collection.

Create a 100 % stacked area chart with matplotlib

I was wondering how to create a 100 % stacked area chart in matplotlib. At the matplotlib page I couldn't find an example for it.
Somebody here can show me how to achieve that?

A simple way to achieve this is to make sure that for every x-value, the y-values sum to 100.
I assume that you have the y-values organized in an array as in the example below, i.e.
y = np.array([[17, 19, 5, 16, 22, 20, 9, 31, 39, 8],
[46, 18, 37, 27, 29, 6, 5, 23, 22, 5],
[15, 46, 33, 36, 11, 13, 39, 17, 49, 17]])
To make sure the column totals are 100, you have to divide the y array by its column sums, and then multiply by 100. This makes the y-values span from 0 to 100, making the "unit" of the y-axis percent. If you instead want the values of the y-axis to span the interval from 0 to 1, don't multiply by 100.
Even if you don't have the y-values organized in one array as above, the principle is the same; the corresponding elements in each array consisting of y-values (e.g. y1, y2 etc.) should sum to 100 (or 1).
The below code is a modified version of the example #LogicalKnight linked to in his comment.
import numpy as np
from matplotlib import pyplot as plt
fnx = lambda : np.random.randint(5, 50, 10)
y = np.row_stack((fnx(), fnx(), fnx()))
x = np.arange(10)
# Make new array consisting of fractions of column-totals,
# using .astype(float) to avoid integer division
percent = y / y.sum(axis=0).astype(float) * 100
fig = plt.figure()
ax = fig.add_subplot(111)
ax.stackplot(x, percent)
ax.set_title('100 % stacked area chart')
ax.set_ylabel('Percent (%)')
ax.margins(0, 0) # Set margins to avoid "whitespace"
plt.show()
This gives the output shown below.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Avoid duplicate labels in matplotlib with x, y of form [[...], [...] ...] - python

Related

Creating a phylogenetic tree with domain annotations using BioPython

How to update y-axis in matplotlib

Matplotlib how to dotplot variable number of points over time?

What does indexing the matplotlib axis do in a loop?

Create a 100 % stacked area chart with matplotlib

Categories

Resources